IRE 


Transactions 
on INFORMATION THEORY 


A Journal Devoted to the Theoretical and Experimental Aspects of Information Transmission, Processing and Utilization. 


Volume IT-6 SEPTEMBER, 1960 Number 4 


Published Quarterly 


In This Issue 


On the Statistical Theory of Optimum vate f be, 


Mean-Square Noise Power of an Optimum Continuous Filtéeg 
Some Quantum Effects in Information Channels 
Spectral Analysis of a Process of Randomly Delayed Pulses 
Binary Codes with Specified Minimum Distance 
On Decoding Linear Error-Correcting Codes 
Encoding and Error-Correcting Procedures for the Bose-Chaudhuri Codes 
Synchronization of Binary Messages 
Analytic Inversion of a Class of Covariance Matrices 
An lsospectral Family of Random Processes 


Optimal Mean-Square Systems 


| | PUBLISHED BY THE 
Professional Group on Information Theory 


IRE Professional Group on Information Theory 


The Professional Group on Information Theory is an organization, within the framework of the IRH, of 
members with principal professional interest in Information Theory. All members of the IRE are eligible 
for membership in the Group and will receive all Group publications upon payment of an annual fee of 
$4.00. 


ADMINISTRATIVE COMMITTEE 


P. E. Green, Jr. (’60), Chairman G. L. Turin (’62), Vice Chairman 
Lincoln Laboratories Hughes Research Labs. 
Mass. Inst. Tech, Malibu, Calif. 
Lexington, Mass. 


A. G. Schillinger, Secretary-Treasurer 
Polytechnic Institute of Brooklyn 
Brooklyn, N. Y. 


N. M. Abramson 
Elec. Engrg. Dept. 
Stanford University 
Stanford, Calif. 


Peter Elias (’61) R 
Mass. Inst. Tech. N 
Cambridge, Mass. Ni 


. A. Silverman 
. Y. U. Inst. of Mathematical Sciences 
ew York, N. Y. 


D. A. Huffman 
Mass. Inst. Tech. 
Cambridge, Mass. 


T. P. Cheatham, Jr. (’62) 
Litton Industries, Inc. 
Beverly Hills, Calif. 

J. L. Kelly, Jr. 
Louis A. deRosa (’61) 
ITT Laboratories 
Nutley, N. J. 


Murray Hill, N. J. 


Ernest R. Kretzmer (’62) 
G. A. Deschamps (62) 
University of Illinois 
Urbana, Ill. 


Murray Hill, N. J. 


F. W. Lehan (’61) 
Space Electronics Corp. 
Glendale, Calif. 


TRANSACTIONS 


A. Kohlenberg, Kditor 
Melpar, Inc. 
Watertown, Mass. 


P. E. Green, Jr. 
Editorial Policy Committee 
M.I.T. Lincoln Labs. 
Lexington, Mass. 


Bell Telephone Labs., Inc. 


Bell Telephone Labs., Inc. 


F, L. H. M. Stumpers (’62) 
N. V. Philips 
Gloeilampefabrieken 
Research Laboratories 
Eindhoven, Netherlands 


David Van Meter (’61) 
Litton Industries, Inc. 
Waltham, Mass. 


L. A. Zadeh (’61) 
University of California 
Berkeley, Calif. 


A. Nuttall, Associate Editor 


Melpar, Inc. 


Watertown, Mass. 


Peter Elias 


Editorial Policy Committee 


M.1.T 


Cambridge, Mass. 


IRE Transactions® on INFORMATION THEORY is published in March, June, September, and December, 
by the IRE for the Professional Group on Information Theory, at 1 East 79th Street, New York 21, 
N. Y. In addition to these regular quarterly issues, Special Issues appear from time to time. Responsibility 
for contents rests upon the authors and not upon the IRE, the Group, or its members. Price per copy: 
IRE-PGIT members, $2.30; IRE members, $3.45; nonmembers, $6.90. 


INFORMATION THEORY 


Copyright © 1960—Tue Instirure or Rapio ENcineers, INC. 
Printep 1n U.S.A. 


All rights, including translation, are reserved by the IRE. Requests for republicati ivi- 
leges should be addressed to the Institute of Radio Engineers, 1 E. 79th St., New York 21, N.Y 


IRE ‘Transactions 
on 
Information Theory 


A Journal Devoted to the Theoretical and Experimental 
Aspects of Information Transmission, Processing and Utilization 


Volume IT-6 September, 1960 Number 4 


Published Quarterly 


TABLE OF CONTENTS 


Contributions 


On the Statistical Theory of Optimum Demodulation J. B. Thomas and E. Wong 420 
On the Mean-Square Noise Power of an Optimum Continuous Filter for Correlated 


Noise Marvin Blum 426 
Some Quantum Effects in Information Channels T. H. Stern 4385 
Spectral Analysis of a Process of Randomly Delayed Pulses M.V. Johns, Jr. 440 
Binary Codes with Specified Minimum Distance Morris Plotkin 445 
On Decoding Linear Error-Correcting Codes—I Neal Zierler 450 


Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 
W.W. Peterson 459 


Synchronization of Binary Messages E.N. Gilbert 470 
Analytic Inversion of a Class of Covariance Matrices William A. Janos 477 
An Isospectral Family of Random Processes Richard A. Silverman 485 
On a Characterization of Processes for which Optimal Mean-Square Systems are of 
Specified Form A.V. Balakrishnan 490 
Correction to “On New Classes of Matched Filters and Generalizations of the 
Matched Filter Concept” David Middleton 501 
Correspondence 
Remarks on Sine Waves Plus Noise R. Leipnik 502 
Correction to a Paper by D. G. Lampard I. 8S. Reed 502 
Note on “On Upper Bounds for Error Detecting and Error Correcting Codes of 
Finite Length” R.G. Fryer 502 
A Note on Single Error Correcting Binary Codes N. M. Abramson 502 


Transmission of Photographic Data by Electrical Transmission 
G. Raisbeck and J. Goldhammer 503 
A Note of Caution on Square-Law Approximation to an Optimum Detector 
J. J. Bussgang and W. L. Mudgett 504 


Contributors 506 
Book Reviews 508 
Abstracts 509 


420 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septemt 


On the Statistical Theory of Optimum Demodulation* 


J. B. THOMAS}, MEMBER, IRE, AND E. WONG}, MEMBER, IRE 


Summary—The multidimensional demodulation problem is 
considered from the point of view of statistical estimation theory 
and a posteriori most probable signal estimates are derived. Cor- 
related signals and noises are treated. This formulation yields a 
set of two matrix integral equations which must be solved for the 
optimum estimates. 

For amplitude modulation, the problem reduces to that of finding 
a set of time varying filters which are, again, solutions to a matrix 
integral equation. Special cases such as two-receiver systems, 
quadrature modulation, and single-sideband have particularly 
simple representations and are considered in some detail. 


N interesting problem in statistical communication 
A theory is the ‘‘optimum”’ estimation of modulated 
intelligence in the presence of additive noise. For 
linear forms of modulation, the problem is essentially that 
of linear nonstationary filtering, and application of the 
minimum mean-squared error criterion leads to a reason- 
ably simple integral equation.’’? Similarly, for nonlinear 
modulations, e.g., FM, PM, etc., minimum mean-squared 
error nonlinear filtering theory can be applied.’ However, 
even with simplifying restrictions,* the resulting math- 
ematics is formidable and not usually amendable to 
explicit solutions. The methods of statistical estimation 
theory have been used to obtain a posteriori most probable 
estimates of generally modulated Gaussian signals in 
Gaussian noise.” This treatment results in two integral 
equations which specify the optimum receiver. 

An extension of such estimation techniques to the 
multidimensional case is considered here. This extension 
treats the reception of more than one waveform, the 
estimation of more than one signal, and the case where 
signals and noises are correlated. 


FORMULATION 


The use of a posteriori most probable estimation is dis- 
cussed in detail in the literature.’ ° It suffices to state 


* Received by the PGIT, August 6, 1959. 

{ Dept. of Elec. Engrg., Princeton University, Princeton, N. J. 
_?R. C. Booton, Jr., “An optimization theory for time-varying 
linear systems with nonstationary statistical inputs,” Proc. IRE, 
vol. 40, pp. 977-981; August, 1952. 

2 R. C. Booton, Jr., and M. H. Goldstein, Jr., “The design and 
optimization of synchronous demodulators,”’ 1957 IRE Wrscon 
ConvENTION RECORD, pt. 2, pp. 154-170. 

§L. A. Zadeh, “Optimum nonlinear filters,’ J. Appl. Phys., 
vol. 24, pp. 396-404; April, 1953. 
he Such as restricting the nonlinear filter to be a one-convolution 

er. 

5D. C. Youla, “The use of maximum likelihood in estimating 
continuously modulated intelligence which has been corrupted by 
noise,’ IRE Trans. on InrorMATION Theory, vol. IT-3, pp. 90-105; 
March, 1954. 

_ ®§& P.M. Woodward and I. L. Davies, ‘‘A theory of radar informa- 

tion,” Phil. Mag., ser. 7, vol. 41, pp. 1001-1017; October, 1950. 
_ ™P. M. Woodward and I. L. Davies, ‘Information theory and 
inverse probability in telecommunication,” Proc. IEE, vol. 99, 
pp. 37-44; March, 1952. 

8 F. W. Lehan and R. J. Parks, ‘Optimum demodulation,” 1953 
IRE Nationat Convention Recorp, pt. 8, pp. 101-103. 


here that, given the received waveforms, those signals a 
chosen as estimates which have the greatest condition 
likelihood of occurrence. 

Let the received waveforms be 


Fu) = mau), uj tam, t-T<ust,  ( 


where 7(u), m[G@(u), u], Gu), and 7(w) are column vector 
e.g, 
7 (u) 


Tu) = 


r2(u) : ( 


rq(u) 
Here, a(u) represents the modulating signals and a(u 
additive noises. The components of both a@(u) and n(; 
are assumed to be correlated Gaussian time series wi 
zero means. The vector m[d(w), u] is a general modulati¢ 
function whose form depends on the modulating schem 
It is assumed that this modulation function is differential 
with respect to the elements a;(w). 

In general, the noise vector 7%(w) will have g component 
as will m[a(u), uJ]. The signal vector a(w) will be taken | 
have k components where k and q are not necessari 
related. ‘ 

The problem is to find the set of a;(u), denoted } 
a*(u), such that the conditional probability p(@/7) is: 
maximum. Let the joint probability p(@, 7, 7) be writt* 


pa, %, 7) = pla, 2)/Tlp@) = pir/G, n) |p, n) ( 


where p(d, 7%, 7) is the probability of the simultaneo 
realizations of 7(w), aw) and #(w) in the interval ¢ | 
T <u <t, and a similar definition holds for the oth 
terms. Eq. (3) may be rewritten 


_ ple/(G, 2p, A) 
pr) 


pL, n)/7] 
If it is noted that 
plr/(G, n)] = dln — &F — m)], ‘ 
where 6(«) is the Dirac delta-function, then 
2G) r= : 
Kq. (6) was obtained by integrating both sides of (4) w? 


respect to % and using the relationship of (1). For a giv 
set of received waveforms, p(7) is a constant; therefore) 


p(a/7) = kyla, @ — m)]. 


I60 


, is desired to maximize this expression with respect to 
1e elements of d. 
ANALYSIS 


Define €(u) to be a column vector 


Ee) = er 
. nu) 


nd the associated covariance function matrix R(u, v) 
ith elements 


(8) 


R;,(u,v) = Ej2x,(u)z,0)} (9) 


here H {| } indicates the expectation of the bracketted 
uantity. It is apparent that R(u, v) can be partitioned as 


Rw, v) = eo Sea 
Reo) Rees?) 


It is convenient to use a multidimensional expansion 
scently introduced,’ and to write 


(10) 


PO) =a poy, rt — TS ue < 4, (11) 
pH=l1 


here ¢,(w) is a column vector of g + & components, 


vy (U) 


2,(u) = ow) (12) 


( k)/ 
ee) 


‘the @,(u) are the vector eigenfunctions of the matrix 
itegral equation 
t 
p= d / R@2o. dy Bee T Sa 6°03) 
t— 2: 
1en it can be shown’ that these vectors %,(w) are orthog- 
nal in the sense that, after normalization 


t 
[e080 du = 8. (14) 
t—T 
ad that the coefficients a, are uncorrelated, 7.e., 
1 
E\a,a,} = © 5, (15) 


Since the conditional probability p(d@/7) is proportional 
» the joint probability of the components of (uw), 


p(G/F) ~ exp (—3 oe rye) (16) 


herefore, in order to maximize p(G@/f), it is sufficient to 
inimize the quantity >0°., A,a?. It is shown in the 
ppendix that 


Noe / ; i ae) OG, DD du? — (17 


9 This expansion has been used by L. A. Zadeh and one of us in 
nnection with other work not yet published. 


Thomas and Wong: On the Statistical Theory of Optimum Demodulation 


421 


where the matrix Q(u, v) satisfies the integral equation 


/ Rw, QQ, w) dv = du — w)1, t—T <u,w St, 
t—T 


(18) 
1 being a unit matrix. It is apparent from (83) that the 
matrix Q(u, v) may be partitioned as 


Qu, v) aS Nee v) Ong; : . 
Qna(u, v) Qin, v) 
It is now easy to minimize (16) with respect to the a;(w). 


By the familiar techniques of the calculus of variation, 
the following is obtained: 


(19) 


J, Qsclu, 0) = MCa*, w)Qna(u, 2)]a*@) do 


= ee [M(a*, wQ,,(u, v) — Q.n(u, v)] 


-[7(v) — m(a*, v)] dv, (ue BMS SAYS (20) 


where the modulation matrix M(a*, w) has the elements 
om,(a, u) 

da;(u) a=a* 
Kq. (20) together with (18) is sufficient for the solution of 
the a*(w) in terms of the received waveforms 7(w). 


In the special case where the noises are uncorrelated 
with the signals, (21) and (18) can be used to obtain 


M,(a*,u) = (21) 


mei i Ro Mat ae) dn ee) 
t-—T 
and 


ru) — m(ar,r) = [Reals dae) dr 


p= T <a) 


where g(v) has been written for the expression 


[utes wytrco) — m(a*, w)] dv 


In the one-dimensional case, (22) and (23) reduce to those 
obtained by Youla.° 

In principle, the a posteriori most probable demodulator 
has been found. It is only necessary to specify the form of 
modulation and the covariance functions of the signals 
and noises. In practice, the solutions to the equations may 
be prohibitively difficult depending on the form of 
modulation. 


AMPLITUDE MopuULATION 


General forms of amplitude modulation produce 
relatively simple expressions for the specifying equations 
and will be considered in some detail. In these cases, the 
modulation matrix M is not a function of the signals 
a;(t) and can be written as M(u). Then, the received 
waveforms are 


422 
Fu) = M(wa(u) + rw), 


where M is the transpose of M. 
If, furthermore, the noises are uncorrelated with the 
signals, manipulation of (22) and (23) yields 


a(t) = if Wu, v)r(v) dv, t— Ts< u < t, (25) 
and 
[Wea dOMOR.G, WMC) + Ryo, w)] do 


= R,(u, w)M(w), t— Pu, wes FE, (26) 


where W(u, v) is a weighting function matrix determined 
from (26). Eq. (26) can also be derived as the specifying 
equation for the minimum mean-squared error non- 
stationary filter. 

Eqs. (25) and (26) may be used to investigate a number 
of special cases of interest. 


Case 1—Multireceiver Systems 


When the noise level at the receiver itself is large com- 
pared to that of the transmission link, it is advantageous 
to consider multireceiver systems. Their advantage lies- 
in the fact that the noises in the various inputs are un- 
correlated, while the signals are either highly correlated 
or the same. Applications for these types of systems occur, 
for example, in the field of radio astronomy.’?’” 

Let us consider a two-receiver system where the received 
waveforms are 


nu) =MwMow+umw, t-Tsust (27) 
and 
ro(u) = M(ua(u) + nu), Pont DO nty” Sond. 9 (ee) 


Then (25) and (26) become 


a*(u) = [ W,(u, v)r,(v) dv + He W.(u, v)r(v) dv, (29) 
and 


[Wit NLM) MORE, w) + Rar(0, )] a 


Ee [ | Walu, )MW)M(w)R.0, w) do 


= Ru, w) Mw). (30) 
ik W.(u, v)[Meo)M(w)Rv, w) + Rioo(v, w)] dv 
+ if : Wu, 1) M0) M(w)R,v, w) do 
= Ru, w) Mw). (31) 


10 R. H. Dicke, “The measurement of thermal radiation at micro- 
wave frequencies,” Rev. Scz. Instr., vol. 17, pp. 268-275; July, 1946. 

18. J. Goldstein, ‘A comparison of two radiometer circuits,” 
Proc. IRE, vol. 48, pp. 1663-1666; November, 1955. 


IRE TRANSACTIONS ON INFORMATION THEORY 
Tt is interesting to hote that if Ray —\ lees ve iehenm 


Septemb 


symmetry of (30) and (31) implies that 
Witu, v0) = Wu, v) 2 Wu, v), (3: 
and (30) and (81) reduce to 


ih "Wu, )[2MG)MOo)R,(v, w) + Rev, w)] dy 


A R(u, w) Mw), (3 


while the corresponding equation for one dimension is 
[ We, dMOMC)RO, uv) + Ril, u)] do 
t—T 


= R,(u, w)M(w). (3: 


A compasison of (33) and (34) indicates the advantag 
of a two-receiver system. Effectively, the signal ley 
relative to noise is doubled. 


Case 2—Multiplex Systems 


Various multiplex modulation schemes are used | 
communication. They have the common characterist 
that more than one signal is transmitted simultaneous! 


. . Le, 
on a time or frequency sharing basis. 


Quadrature Modulation 


One of the most familiar examples of multiplexing 
quadrature modulation, which, in the formulation di 
cussed here, has a particularly simple representation, — 

Let the received waveform be 


r(u) = Cos wowa,(u) + sin wua(u) + nu),t—- Tus 
(3: 


Then, (25) and (26) become 


ate) = [Wu orb) ao, es 


ast) = ‘ W.(u, v)r(v) dv, 


t 
and 


t 
/ W(u, v)[ cos wovRaii(v, W) COS wow 
t-T 


+ COs wobhaov, W) SIN woW + SiN w2:(V, W) Cos 


+ sin wvRa0(v, w) sin ww + Rv, w)] dv 


= Raii(u, w) cos ww + Raia(u, w) sin wow, 
t 
ip W.(u, v) [cos wovRaii, W) Cos wow 
t-T 


+ cos wRaio(v, W) Sin wow + sin wovR,.2:(v, W) COS @ 
+ sin wopR.2.(v, w) sin ww + R,(v, w)] dv 


= Rio (u, w) cos aw + Ryoo(u, w) sin wow. 


2H. §. Black, “Modulation Theory,’? D. Van Nostrand 
Inc., New York, N. Y.; 1953. 


I60 


should be noted that the kernels of (39) and (40) are 
entical. 


ingle Sideband 


Although single-sideband amplitude modulation is 
asically a one-dimensional problem, it can be con- 
eniently treated as a special case of quadrature modula- 
on. In this case, 


u) = COs wua(u) + sin wud(u) + n(u), 
bee Sars St, (41) 
here d(u), the Hilbert transform of a(w), is defined’ by 


(42) 


In the case when the signal and noise are stationary, the 

presentation simplifies even further. If we define 
(u) + E{a(t) a(t + w)}, then the following relationships 
‘e easily derived: 


E{aatt + wy} = Ra), (43) 
Efé@a(t + w} = —Riw, and (44) 
Efaa(t + o} = RQ. (45) 


sing these expressions, we find that (39) reduces to 


t 


Wu, v)[R.(w — v) cos w(w — v) 
a 


— 


+ £,(w — v) sinw(w — v) + R,(w — v)] dv 


= R,(w — u) cosaw + B,(w — w sin wow. (46) 


1 this case, W.2(u, v) is of no interest since it gives the 
timate of G(u). It should be noted that R,(u) is an odd 
inction of uw, and therefore, the kernel of (46) remains 
mmetric with respect to the variables u and v, as it 
ust. 


Quadrature Modulation Example 


In the special case where a,(w), a2(u) and n(u) are 
ationary, and where a,(w) and a,(u) are uncorrelated 
1d have the same autocorrelation function, (89) and 
0) reduce to 


t 


W,(u, v)[R,(w — v) cos w(w — v) + R,(w — v)] dv 


= R,(w — uU) cos wow (47a) 


id. 


t 


W.(u, v)[R.(w — v) cosw(w — v) + R,(w — v)] dv 


= R,(w — u) sin ww, (47b) 
1ere 


R.(w — v) = R,,,(0, w) = R,,,(v, wv). (48) 


13. C. Titchmarch, “Introduction to the Theory of Fourier 
egrals,”’ Oxford University Press, London, England; 1937. 


Thomas and Wong: On the Statistical Theory of Optimum Demodulation 


423 


Note that the integral equations (47a) and (47b) have 
kernels which are functions of the difference of two 
variables and can be solved easily by standard tech- 
niques. * 

In order to obtain an indication of the forms of the 
optimum demodulators, we shall give an example with 
explicit solutions. Let 


R,(u) = Arse“, (49) 


R,(u) = No Su), (50) 


and the received waveform be given for the range — ~ tot. 
Then, with a change in variables, (37) and (47a) become 


Bh) = i Wi(t, wr(t — w) du (51) 


and 
3 ra Ava —alv—ul oe ey 

a Wit, w/ Ae COS wo(v — u) + No so — 0) | du 
0 


A g 
a ee COS w(t — v). 


9 (52) 


Eq. (52) can be solved in the usual way” and W/(t, u) 
is found to have the form 


Wi(t, uw) = hi(u) cos wot — u) + hya(u) sin w(t — u). (538) 


In other words, the receiver can be represented as shown 
in Fig. 1. This is a form of synchronous receiver with 


sin wt 


Fig. 1—An optimum quadrature demodulator. 


specific stationary filters. For convenience, we define the 


constants 
ree 
= 54 
a ONG G» 
and 
pa, (55) 
Wo . 


and consider two cases. 


“uy, A. Zadeh and J. R. Ragazzini, “An extension of Wiener’s 
theory of prediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 
1950. 


424 


given by 


IRE TRANSACTIONS ON INFORMATION THEORY 
1) For the case where 6? < 4(1 + ¢)/é, the filters are - 


his(u) = ae °*“[K, cos wo(1 — bu — Ke cos wo(1 + b)u 


+ Kysin w(1 — bu — Kysin w(1 + bu] 


and 


(56) 


hy2(u) = we °*“[K; cos w(1 — b)u — Ky cos a1 + bu 


with constants 


ee = OIE aay se Get Hy] 


2b 2 
7 b@= Das + Gi 
ae 2b ) 
_a— die’@—-—)’+ — 0)’ 
Kz =o 2b ) 
pe Oe I ae De (=) 
ye ae Db ) 
and 
a = 5 (3106 + D+ 200%" + IN” 
+ (1 + 6p" — 1}? 
and 


b= {2106 1)° + 26676" + 1)" 
= ONES cs Se a eet 


(57) 


(58) 


(62) 


(63) 


2) For the case where 6” > 4(1 + ©)/e’, the filters are 


given by 
hys(u) = ae * “(K$ sin wu — K{ cos wou) 
+ we? “(KS cos wu — Ki sin wot) 
and 
hyo(u) = we “(KS cos wou + K{ sin wou) 


— we” “(Ki sin wu + K4 cos wou), 


where 

a= ; (Cele iad pitta Seca 
ee ; CR ee lg eT hea TOP oracle eis 
and 


Ba’ — 1)* +1 
B(b) — a’) ’ 


jt P=) Ceo Aaa 2 
ae ly ae [8 (a 1) fail 


a= 


pal BO! al)" et 
ae gate) Bh 


(64) 


(65) 


(66) 


(67) 


(68) 


(69) 


(70) 


Septemb 


hie Pe ey (7 

b—@ 

It should be noted that, despite their complexity, tl 

filters can be synthesized as lumped-constant R-L- 
networks for any given 6 and e. ~ 

It is interesting to consider some limiting cases of tl 


example: 
1) «— 0 (very small signal power), 
hys(u) => wobee °", (7: 
h,.(u) — 0 (7 


to the first order in e. 
2) « — & (noise power becomes negligible), 


(k — a)? + 0% (7 


hyy(u) > 2 2 sin wyue™ + 5(u), 
0 
where 
k = (? + wo)” (7 
and | 
hio(u) Aas) cos woue ” + iS d(u). (7 
Wo Wo 


In the same way as before, the optimum estimat 
a%(t) can be found to be 


A= Ne W(t, uwr(t — u) du (71 


with 
W2(t, u) = he(u) Cos wo(t — u) + hoo(u) sin w(t — u). (7 


It should be noted that h.,:(w) and h2(w) are simpl 
related to h,.(w) and h,,(u); in fact, 


hota) = —hysu), (78 
and 


hoo(u) = hi(u). (si 


AM DEMOoDULATION WITH DELAY 


With present techniques, most of the integral equatioi 
involved in optimum demodulation are difficult to soly 
explicitly. However, if a reasonable delay can be tolerate} 
approximate solutions to a large class of AM problen 
can be obtained. The demodulator is found to be a syy 
cronous demodulator followed by a type of Wiener filte 
Problems of this nature have been treated in some detail 
for the one-dimensional case. Extensions to multi-c 
mensional cases are straightforward.”° 


1% J. B. Thomas, “On the Statistical Design of Demodulatie 
Systems for Signals in Additive Noise,’ Stanford University Ele 
tronics Res. Lab., Stanford, Calif., Tech. Rept. No. 88; August, 197 

16 J. B. Thomas, T. R. Williams, J. Wolf and E. Wong, “T? 
demodulation,of AM signals in noise,” Proc. 1959 IRE Convent) 
on Military Electronics, pp. 138-146. 


60 


NoNLINEAR MopuLATIONS 


For nonlinear forms of modulation such as FM and PM, 
e problem of optimum estimation cannot be reduced to 
at of finding a time varying filter. In general, one has to 
msider the solution of (20) for a*(u). Although this 
uation is not usually amenable to explicit solution, it is 
a form that can be treated by analog techniques. Indeed, 
feedback is allowed in the system, it essentially specifies 
ie demodulator. Some work along this line has been 
itiated.”’ 


PROBLEMS IN CARRIER SPECIFICATION 


In this formulation, the phases, amplitudes and fre- 
iencies of the carriers are assumed known. In practice, 
iis knowledge must be obtained frequently either by 
erating on the received waveforms or by transmitting 
1€ carriers over a separate channel. Both of these methods 
volve errors due to noise and thus cause additional errors 
. the estimation of signals. Such difficulties are common 
) all synchronous receiver systems. 


APPENDIX 


With the use of the orthonormality condition given by 
(4), the coefficients of expansion a, can be expressed as 
t 
c= ik 2,(u)-£(u) du. (81) 
t-T 


herefore, the sum ).%., \,a2 is evaluated to be 


50 o t t N 
ee = Dy ih / (os gp (u)x,(u) 
=1 p=1 t-7T Jt-T \i=1 


N 


Soi W)e@)) dude (82) 


here N = gq + k. Now, define the matrix Q(w, v) by the 
lationship 


17R. Jaffe and E. Rechtin, “Design and performance of phase 
eked circuits capable of near optimum performance over a wide 
nge of input signal and noise levels,” IRE Trans. on INFORMA- 
on TueEory, vol. IT-1, pp. 66-76; March, 1955. 


Thomas and Wong: On the Statistical Theory of Optimum Demodulation 


425 


Ql.) = Tw Wy’. (83) 


Then, the sum >>®., ,a2 becomes 


Save Zs / ‘ if : BOVE CD) 


The matrix Q(u, v) is related to the covariance function 
matrix R(u, v). This relationship becomes clear when the 
integral 


Xf Rule, Dnt, w) dy 


is examined. With the substitution of (83) for Q;.(v, w) 
and the use of (13), this integral becomes 


N t 

oe i Ri(U, Qi, w) do 

j=i Jt-T 
t N 

ru) | Raw, dos?@) do 
t—7 7=1 


Gr (upp (w). (85) 


p=1 

= 2 

The sum on the right-hand side above satisfies the 
identity 


foo} 


We) = ia tu — v). (86) 


p=1 
To prove this identity, multiply both sides of (86) by 
gv (u), sum over the index 7, and integrate with respect 
to u. With the use of the orthonormality condition, this 
procedure yields the identity 


gp (w) = 9, (w), 


showing the validity of (85). Therefore, the matrix 
Q(v, w) is related to the covariance function matrix 
R(u, v) by matrix integral equation 
t 
J RG, 9Q@, w) d = du — w)1, (87) 
t-T 


1 being a unit matrix. 


426 


IRE TRANSACTIONS ON INFORMATION THEORY 


\ 


Septemb 


On the Mean-Square Noise Power of an Optimum 
Continuous Filter for Correlated Noise* 


~ MARVIN BLUM], MemBER, IRE 


Summary—This paper presents the equations for the mean- 
square error of the output of a continuous finite memory filter. 
The filter output error is unbiased for arbitrary input polynomials 
up to degree n, and has minimum variance. The input is taken as 
a polynomial of degree n plus random stationary noise. Noise 
processes are considered, 1) where the noise is exponentially 
correlated, and 2) in the white noise case. The solution for a desired 
output which is an arbitrary fixed linear operation on the input 
polynomial is given. 

Tables and graphs of the mean-square error for the derivative 
and prediction operator for the Oth, Ist, and 2nd derivatives are 
presented, and for input polynomials up to the 6th degree. 


INTRODUCTION 


N a paper by Zadeh and Ragazzini,! a solution for 


the optimum continuous filter in a minimum variance 

sense was given. In this paper, the authors developed, 
as an illustrative example, the detailed equations for the 
mean-square error output of a first order polynomial 
passing filter for exponentially correlated noise input. 
However, an attempt to extend the details of the solution 
to higher order filters and maintain an analytic solution 
leads to prohibitive labors because of the necessity of 
inverting high order matrices whose elements are each 
functions. To circumvent this difficulty, a method is used 
which is based on a generalization of the optimum filter 
as presented in a previous paper.” In this solution a set 
of orthogonal polynomials is defined such that the sub- 
matrices, depending upon the order of the filter, are di- 
agonalized and an analytic solution becomes feasible for 
any order of the filter. 


Wuirtt Noise SoLutriIon 


Let the input to a continuous filter with finite memory 
over the interval (0, 7’) be 


S@) = PW) + N@ (1) 


where 


P() = DY aPulo al Veer ces? ay) 


* Received by the PGIT, April 8, 1959. 

+ System Development Corp., Santa Monica, Calif, 

1L. A. Zadeh and J. R. Ragazzini, “An extension of Wiener’s 
enry of prediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 


2 Marvin Blum, ‘Generalization of the class of nonrandom inputs 
of the Zadeh-Ragazzini prediction model,” IRE Trans. on In- 
FORMATION THEORY, vol. IT-2, pp. 76-81; June, 1956. 


and P,(t) is a modified Legendre polynomial’ given by 


rio = Een QA) 


Then it can be shown that these polynomials are orth« 
gonal with respect to integration over the interval (0, T 


e.g-; 


bi 
/ POP dt = 08 ken 
0 


he , is 7 - 
[ POY dt = soy = Se 


It is easily shown that the right hand side is unchanged 
(7 — t) is substituted for ¢ in the left hand side of (4) an 
(5). Let N(t) be a white noise process with ensemb! 
average equal to zero for all ¢, such that the pee 
function is defined by 6(7), and the spectral density fun 
tion is unity. 
Using the notation of Blum’s previous paper,” let 


=f KMOPCT—pa k ( 


. > Nn 


where the unspecified desired linear transformation on tk 
input data defines k(/). Note that by Blum’s equatia 
(14)" 


Qi. 


where W(x) is the weighting function of the optimu 
filter. Let 


Mal 
nee / PT — )PA(T — 2 dt 


Then by (4), the (n + 1) X (n + 1) matrix S wha 
elements are S;,,, is given by a diagonal matrix what 
elements are So, S;, --+ , S, [see (5)]. The rms outp’ 
error by (42)* becomes 


2 3G 


OR ae : ( 
k=0 Si. 


’ The modification consists of substituting y 
ieee 


Gy wh; 
& Equation numbers of author’s paper, footnote 2. 


I60 
he weighting function is given by (41),* and becomes 


1 particular, when the desired output is the Lth derivative 
‘the input at 7’ — a, then 


W(x) = Oya... (10) 


an : 
QS PS) = PPC =a), (i) 
) that the rms ouput error (9) is given by 
n (L) (rn __ 2 
etal >» [Pi (T a) ; (12) 
=r Si 
et 
T—a 
Yo 5) oe ’ teal a; 
en 
PEG — = 4 pyy-L pry, as) 
Y hd d L pe 
> that (12) becomes 
eS o = ys [Pe (y) (Qk + 1) = Hi, 1(y). (14) 
k=L 


Properties oF H;, ,(y), THE MEAN-SQUARE 
PROPORTIONALITY FACTOR 


The prediction parameter a has the following inter- 
retation. When a = 0, y = 1, and the estimate corre- 
ponds to an end point or zero lag smoothing. When 

= T/2, y = 1/2, the output of the filter at time T is 
n estimate of the midpoint of the data interval. When 
-< 0so that y > 1, the output of the filter is an estimate 
f a predicted value of the input. The function Hz, ,(y) 
atisfies the following relationships: 


Hy+1,1(Y) oe H7. xy); (15) 
Ay 1h —y= Hz. 1(y), and (16) 
Hy) = D anty — 9”. (17) 


n (15) the equality sign holds only for the roots of P,(y) 
nd certain adjacent pairs of integers. Eqs. (16) and (17) 
tate that H?.,(y) is symmetric about y = 1/2 and is 
apresentable as a polynomial of degree 2n in y. As a 
onsequence of (17), H;,,(y) is a polynomial of degree 2n 
1 y but only of degree n in (y — 1/2)”. Thus for purposes 
f interpolation it is more convenient to use a divided 
ifference interpolation formula using the variable 


z2=(y—})’. (18) 


The orthogonal polynomials P,(y), k = 0, 1, --- , 8 
re listed in Milne.° A listing of the divided difference 


5W. E. Milne, “Numerical Calculus,” Princeton University 
ress, Princeton, N. J., 3rd ed., p. 260; 1949, 


Blum: Mean-Square Noise Power of an Optimum Continuous Filter 


427 


interpolation (and extrapolation) formulas are found in 
Appendix I. Graphs of H,,,(y) vs. y are plotted for 
n=0,1,---,6,1=0,1,2and 1/2 <y < 7/2. 


EXPONENTIALLY CORRELATED NOISE 


The solution for the mean-square error of estimation 
will be obtained for the case when the autocorrelation 
function is given in two forms: 


a) o, corresponding to the correlation function y,(r) = 
[a/2] e~*'"', and 
b) 6” corresponding to the correlation function 6G) = 
—alr| 
: (19) 


The solution for form a) will approach the solution for the 
white noise case as a > ©. The details of the solution for 
form b) will be given. 

In (23),* it is shown that the weighting function W(z), 
defining the optimum unbiased filter with minimum 
variance at 7) = T, must satisfy the integral equation 


fe W(x) 0(t — x) dx = 5 r,P(T' — 2) 


OFS (20) 


The explicit solution of (20) is shown to be by (32)* and 
(35)* of the form (21), where a standard interval (0, T) 
will be used without loss of generality, and, thus, the unit 
step function may be deleted. The coefficients u, are 
independent of time and are to be determined. 


W() = DY mPAT — 2) +0 le) + D ar a) 


Ora eer. (21) 


To determine the coefficients u,, C and D of (21) one uses 
the n + 1 constraint relationships of (14)* 


/ AGRO oa 


k=0,1,2,-- (22) 


One can substitute (21) into (20) to obtain two linear 
homogeneous equations involving the C and D. These 
come about because (20) is an identity in ¢, and in sub- 
stituting into the left-hand side of (20), one generates 
functions 6(¢) and 6(7’ — t). For the equation to remain an 
identity, the coefficients of these terms must be set equal 
to zero since these functions do not appear on the right 
hand side of (20). 


HoMoGENEOUS EQUATIONS FOR C AND D 
On substituting (21) into (20) one obtains, 
singe {PT — x) +c ic) + D A(T — x} 
0 
ery | aE 


(23) 


=") ,PAT — 0). 
k=0 


428 


Let us evaluate the kth component on the left-hand side 
of (23), then 


T k 6 L L-j (i) 
f Pe pK rn ake ted dz = —6() > 2 os a 
0 L=0 7=0 a 
b T 7 L- Ch: (7) : ‘ 
+ pty CoO TO" 14-0) 
bn (—1)"L! 
Sar D2, ats a (24) 
where the substitutions 
(m)? = (m)(m — 1) +--+: (m—j+1), (m=1 
k L 
[MON DS ee ee eg (25) 
L=0 fP 
L 
ba = NE”) 


have been made. 

Performing the integration as indicated in (23) and 
setting the coefficients of @(¢) and 6(7’ — t) equal to zero, 
one obtains the two equations 


Cr DD MA; = 0 
k=0 


(26) 
D ote Dy MB, = 0 
where from (24), 
k L 
Ay = -2 OD besa)” (27) 
Se Sein oer (28) 
a L=0 


Substituting the weighting sequence from (21) into (22), 
one obtains the system of linear equations 


HiSe + CPAT) + DP,(0) = QO. k ae 0, 15 2, 


This system of linear equations can be solved by invert- 


vson. (29) 


ing a matrix by the method of submatrices.° Let the 
equations (26) and (29) be written as 
| 11 | 
oo, OF 0 0 
0 Sp r0 0 0 
0. 0 8, 0 0 
0: 0? 0 eis 
Ao A, As n 
Bs B, 1/055: Fees Be 
| 2» | 
6R, A. Frazer, W. J. Duncan, and A. Collar, ‘Elementary 


Matrices,” Cambridge University Press, Ciaaeae) Eng., pp. 


112-118; 1955. 


IRE TRANSACTIONS ON INFORMATION THEORY 
The notation as indicated by & is an (n + 3) X (n+: 


Septemb 


matrix with submatrices as follows: 


a) a; is a diagonal (n + 1) X (n+ 1) submatrix wit 
elements S; 

b) ay. isan (n + 1) X (2) submatrix, 

C) a; is a (2) X (n + 1) submatrix, and 

d) a. is a (2) X (2) identity matrix. 


Then B = & + (the inverse matrix) is given by 
Bz(2 XK tee 1) B,(2 x 2) 
Let 
xX = 071042 
Y= Qo10443 ’ (32 
6 OL 9 haa Yous 
then 
By = On =\¢ XY, 
Sy Al | 
By. = —X90 ; (33 
lige — eis 
Bs. = 6? 


where it is assumed that 6 and a, are nonsingular. Usini 
the relationships 


A, = (—1)B, 
PO) =41 k = Ofy.2) >= 2, 
PT) = (-1)' | 
then 
Bi =S'*+H (33 


where S~* is the (n+ 1) X (n + 1) diagonal matrix wit 
elements S;* and H is a (n + 1) X (n+ 1) matrix wit 


elements 
= 2B 
Hox 9; ie So;Sox(L oat I) 
| Q12 | 
| PT IPO) Mh Qo 
: P,(T)P,(0) Bi Qa: 
! P.(T)P2(0) 
Me Q» 
i A as te (30) 
| ‘ 
PUP PRO 
pees were Qn 
. C 0 
ne 0) 4 D 0 
| Q22 | 


60 


r even columns and rows; 


Bor Co 


H Re ee ees ee 
eee Dani opant L ore Dxee 

r odd columns and rows; and zero, for rows and columns 
hich are not jointly even or odd. The quantities 2x,” 


ad a,” are defined by 
[n/2] 
Go Se a Sa ee 
k=0 2k ; (37) 
[(n-1)/2] Bot : 
Baga he £ far leo: > 
k=0 Soret 
here [ | is the largest integer in the bracket e.g., 
5) 6 
E]-2 [a]-s 
olving (30) for u,, C and D, one has, 
[1 ar ] k=0 k (38) 
D=-C 
w= BQ 


here uw and Q are column matrices respectively of u, and 


es 


EVALUATION OF THE MEAN-SQUARE ERROR 
FOR EXPONENTIALLY CORRELATED NOISE 


To evaluate 6’ for correlation form b) [6’ corresponding 
» the correlation function 6(r) = e°'"'], one must 
etermine the relationships between the yu, and the d, of 
20), since it is easily shown that 

aa »S Qidx- (39) 
k=0 ; 
Ising the polynomial component of the right hand side 
f (24), one obtains 


pede, (LT ap i) (La) =F (—1)'} 
Ta 7+1 


(40) 


SS ee) aa 
k=0 


y suitably combining terms in the left hand side of (40), 
ne obtains 


n [k/2] Pen =f) 


a = DaPA - 4 (41) 
here 
(2L) Cee 
PO E= Daan P| (42) 


Blum: Mean-Square Noise Power of an Optimum Continuous Filter 


429 


Multiplying both sides of (41) by P;(T’ — #), and inte- 
grating over the interval (0, 7’), one obtains 


2 7 e/a ye — )P(T — 1) rT 
age >) ieee romana Serres 
Let 
BE) (ae Pa Loe 
-@j+n f ye San Oe ay 


Koj-= 0,152: ER at 


and define the matrix V with elements V;,4,,;4:. Then 


Sn Vu = (45) 
The mean-square error is given by 
a = dQ, (46) 


where the prime indicates the transpose is taken. By 
substituting (38) and (45) for (46) one may write, 


2 


ooh os 
Ce ear 


{Q’Bi,V’Q}. (47) 
Eq. (47) may be decomposed into a number of com- 


ponents as follows: 


a) the matrix V’ may be written 


V’ = TI + 1’: (48) 
b) the matrix B/, may be written by (35) as 
i= S'+4+aH', (49) 


where J is an (n + 1) X (n + 1) identity matrix, 
and p’ is defined in Appendix II, so that 


= “a {> Qil2k + 1] + Q’ 4.9} (50) 


where 


= TS"p’ + TH’ + TH'p’. (51) 
Figs. 1, 2, and 3 present 7'”6 vs aT for n = 0, 1, 2, ---6 
and L = 0, 1, and 2, for y = 1, while Figs. 4, 5, and 6 
present the same data for y = 1/2. 

The first term in (50) is the same solution as the white 
noise case except for the factor 2/a, and it is independent 
of aT’ except for a dependence on 7 due to Q,;. The second 
term involves the matrix A, whose elements go to zero 
as aT’ — o go that the asymptotic solution is given by 


Lim (Te) = : Q:[2k + 1). (52) 


aT>0 

The Q, defined for the Zth derivative and prediction 
operator is then 

Ae 


ate 
= 2H L(y). (53) 


430 IRE TRANSACTIONS ON INFORMATION THEORY Septembe 


Rms output error (6) vs (a7) for exponentially correlated The product of the memory span of the filter (7’) and the rm 
(e~!71) input noise, parametric in the order (n) of the filter of output error (6.) vs (a7’) for exponentially correlated (e~*!*!) inp 
memory span 7’, The output is a zero lag (y = 1), Oth (ZL = 0) noise, parametric in the order (n) of the filter. The output is 
derivative estimate of the input polynomial. Zero lag (y = 1), first derivative estimate (L = 1) of the in 


—( (| 1 nom ] 
Fig. 1 04 VS a ) po. Ly: 0 la 5 


ae re 


Sean 


We 


= 


OY 2 BBEE 
eaomn ea as 
Bese 4 es - 
oe : ernie mah ene . | 


ia Eaee : ‘|| 
: Section : | 
ae a - ii ai 1 


SERRE: 


aera 


ns Sea es AEE 
ae 


Saar 
= 


S83, 
h 


SANS 
ENS 
nee ee 


The va tae? of the square of the memory span of the filter (12) 
and the rms output error (6.) vs (a7') for exponentially correlated 
(e~!"!) input noise, parametric in the order () of the filter. The 
output is a zero lag (y = 1), second derivative (LZ = 2) estimate 
of the input polynomial. 


Fig. 3—(7T?6. vs aT). 


960 Blum: Mean-Square Noise Power of an Optimum Continuous Filter 431 


Se a 
Sen as 
ey 


8: 
Le 
oe 


4 


oe 
ee 


et 


3 


oe 


oe e 


The rms output error (6.) vs (a7) for exponentially correlated The product of the memory span of the filter (7’) and the rms 
e-I7|) input noise, parametric in the order (n) of the filter, The OUtput error (@) vs (a7’) for exponentially correlated (e~@l"!), 
utput lags the input by 1/2 memory span (midpoint estimate, put noise, parametric in the order (n) of the filter. The output 
/ = 1/2) and is an estimate of the Oth derivative (L = 1) of the lags the input by 1/2 memory span (midpoint estimate, y = 1/2) 
aput polynomial. and is an estimate of the first derivative (LZ = 1) of the input poly- 


nomial. 


Fig. 4—(6. vs aT’). 
a aT) Fig. 5—(T6. vs aT’). 


iS 


ge 


on 
fine 

oo 
Co 


The product of the square of the memory span of the filter (7?) 
and the rms output error (6.2) vs (a7’) for exponentially correlated 
(e~I7|), input noise, parametric in the order (n) of the filter. The 
output lags the input by 1/2 memory span (midpoint estimate, 
y = 1/2) and is an estimate of the second derivative (LZ = 2) of the 
input polynomial. 


Fig. 6—(T?0, vs aT). 


432 


Note that the spectra associated with form b) of (19) is 
given by 


F 2 
S@) = we zs ee ‘ (54) 
so that 
< 2 
and (52) may be written for the asymptotic solution 
Lim’ 6’ & S()cz. (56) 


aT © 
The asymptotic solution when form a) of (19) is used, 


is given by 


: 2 
immnmeic = 


aT>3@ 


> ik + 1, (57) 


and is identical to the white noise solution (52). 


APPENDIX [ 


RMS Error Proportionatiry Factor H,,, 


A listing of the functions H? ,(z), 2 = (y — 1/2)” for 
n = 0, 1, 2, --- 6, and L = 0, 1, and 2 follows. Figs. 7, 
8, and 9 contain the graphs of H,,,(y) vs y for the zero, 
first and second derivative estimators, parametric in the 
order of the filter (n). 

Note that the functions H,,,(y) have maxima and 
minima in the interval 1/2 < y < 1. For fixed memory 
span, 7’, H,,,,(y) is directly proportional to the rms output 
error o... For even order filters (when the order exceeds the 
order of the derivative) one obtains the smallest rms errors 
by estimating the input polynomial at a point other than 
the mid point, and thus gains by obtaining more accurate 
estimates with smaller lags. For prediction (\y| > 1) note 
that the rms increases monotonically with the order of the 
filter (n), as |y|°"”’. 

The functions H? ,(z), n = 0, 1, --- 6 are given by 


Ho,@) = 1, 
Hi. o(z) = 1 + 12z, 
H? (2) = 2.25 — 182 + 1802’, 
H? (2) = 2.25 + 452 — 6602” + 28002’, 
Hi.) = 3.515625 — 56.252 + 1837.52” 
— 16100z* + 441002*, 

H?..(2) = 3.515625 + 98.43752 — 3937.52” + 585902*° 

— 343980z* + 6985442°, 


7 Preliminary computations show that the asymptotic approxi- 
mation for the evaluation of the rms holds within a relative error 
of 5 per cent if a7’ > 100. These calculations were performed on end 
point estimates. Similar computations for mid point estimates yield 
relative errors of 1 per cent or less. 


IRE TRANSACTIONS ON INFORMATION THEORY 


‘ 


Se Daprecs mea sere RNR CTEM ART PAS) 
Seer ae Sulenusams a 
oo See 
Py ae 


H ty) = 4 qe 

Rye os 

n= order of the filter (highes 
degree inpat polynomial for 


SRY 
SRC 8) 


a 


which the output is unbisse 


io, the rms output error 
aca for white noise input 


iT = memory span of the ee . 
iy = 74 = 0 is a Zero — 


lag estimate} ee 


oe The output is an estimate ” 
_, of the Oth derivative of 
oe ‘the input polynomial. 


5s 
4. 

pee ets 
2 


and 
H6.o(2) = 4.78515625 — 114.843752 + 7579.68752" 
— 1639052* + 15765752" — 68690162° + 11099088¢° 


(y = 172)", {| 
The functions H7,;(z), n = 0, 1, 2, --» 6 are given by 

Hy.) = 0, 

Hic@ = 12; 

H3, (2) = 12 + 7202, 

H3,,(2) = 75 — 18002 + 252002”, 

Hi) = 75 + 63002 — 1260002? + 7056002’, 

H3,(2) = 229.6875 — 110752 + 4630502? — 51156002° 

+ 17463600¢" 


where 2 


and 

H¢,:(@2) = 229.6875 + 24806.252 — 12568502” 

+ 2309076027 — 1641578402* + 3995671682" 
1/2) 


where z 


960 Blum: Mean-Square Noise Power of an Optimum Continuous Filter 433 


Fig. 8—[H,.1(y) vs yl. Fig. 9—[A ny) vs yj]. 


The functions H;,.(z), n = 0, 1, 2, --- 6 are given by 
0,22) = 0, 

1,2(2) = 0, 

=O) = 720; 

, (2) = 720 + 100800z, 


then 


Vis = 0 a) 
Vow 


if w and v are not jointly odd or even. 


2 (2) = 8820 — 3528002 + 63504002’, Virgen 
2 (2) = 8820 + 15876002 — 402192002? + 2794176002° V,. = Dae 
: Vig ets ate eo 
(2) = 4461.25 — 35721002 + 1833678002" 16 = Ge (ary ? 
— 24449040002’ + 99891792002", Sig oe Se EA +, 20160 a 665280 
p fay ae 2 UE (aT')” (aT (aT)' ) 
vere 2 = (y — 1/2)°. 

ae 

Vio = (aT)? {60}, 


AppENpDIx II 


The matrix V of (45) has elements given by (44) as Vi c= a 168 + Bee . 
lows: Let V,,, be the element in the wth row and vth (aT) (aT’) 


lumn f 


oe 10) 
U, Use 1, 2,3,-°* on -F 1; ’ (ary | 


434 
sa = Tap (282, 
and 
Vs,7 ee {396} 
(aT) 
Let 
Pur = uA, 
then the matrix 7'S1’ of (51) is given by, (see 48) 
0 0 0 0 0 
0 0 0 0 0 
5pi,3 0 0 0 0 
DE seca ae D his oe 0 0 
Orel ay Son 0 0 
0 


0 11 po,6 0 11 ps6 


Note that each element of Z7'S4p’ is proportional to 
(aT)? for (aT) >1. The (n +1) X (n + 1) matrix TH’ 
is given by, 


IS 5K 
Oh aetna a 
‘alae te ee 
2 85 
TH’ =2| K 5K 
5 0 ro) 0 : 
1 1 
0 oe 0 (UKE 
2 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septemb 


‘ where 


K, = pS Ome ee 


pore OX i= 1,2. 


The functions K; are given by (5) and (28), and are a 
proportional to (a7) * for a7’ > 1 so that each elemer 
of TH’ approaches zero as [a7] approaches infinity 
Finally, the matrix T'H’p’ is obtained and it is seen the 
each element of 7'H’p’ approaches zero as the O[a7] 
for (a? |i; 


A listing of the functions K; for 7 = 0, 1, 2, --- 6 areé 
follows: Let 
B; 
K; a is ) 
1 
Ko = at ) 
3 2 
5 a 21 
rae 1 120 
Beis -ti+8 Te ea a 
9 180 840 1680 
Bee wae +2 + of * @rye t @ry 1st : 
pe Peg fii 1 30 , 420 , 3360 , 15120 , 30240 
; aT al’ (TY CT) "GD: cue 
and 
zs 42 , 840 , 10080 , 75600 
Ke — aT B14 2 + pt (aT)? + (aT) (aT')* 
332640 , 665280) 
+ 1\5 6 | 
(aT) (aT)° ) 


I60 


IRE TRANSACTIONS ON INFORMATION THEORY 


435 


Some Quantum Effects in Information Channels* 


T. E. STERN}, MemBeEr, IRE 


Summary—lIn this paper the quantum nature of electromagnetic 
idiation is used as a basis for a mathematical model of a continuous 
aannel. It is shown that this ‘‘photon channel’? model leads to 
ore realistic conclusions regarding information transmission. 
mong the results obtained are: 

1) the maximum entropy for a narrow-band source under an 
yverage power limitation, 

2) the frequency distribution (Bose-Einstein) for a wide-band 
ower-limited source, 

3) the transmission rate through a Poisson channel with additive 
oisson noise. ° 


I. INTRODUCTION 


theory, as it applies to continuous channels, will be 
modified in order to put the theory in closer corre- 
pondence with physical laws. It will be shown that these 
1odifications produce new and more useful results in the 
anges where quantum effects become important. 
‘The definitions of entropy, information rate, channel 
apacity, etc., as proposed by Shannon’ for discrete 
ources and channels, leave little to be desired. They form 
consistent mathematical system, and, what is more 
nportant, they are in harmony with our intuitive notions. 
lowever, aS many investigators (including Shannon) have 
ointed out, a formalistic extension of the theory to con- 
muous channels leads, at times, to somewhat erroneous 
onclusions. Consider, for example, the well-known 
xpression for the capacity of a continuous channel with 
verage signal power S, average noise power :-N (white, 
raussian), and bandwidth W, 


R this paper some of the postulates of information 


S 

Cx W log (1 +4). 
s N approaches zero the capacity becomes infinite. 
ntuitively, such a result must be rejected on the grounds 
1at it implies a receiver and transmitter with infinite 
mplitude resolution. However, if we are to reject such a 
rmulation, what resolution should be assumed? An 
answer to this question lies in the dual nature of electro- 
agnetic radiation. Consideration of the wave-like nature 
‘ radiation leads naturally to the continuous model for 
1 information channel and to results typified by the 
90ve example. On the other hand, consideration of the 
wpuscular, or quantized nature of radiation leads 
aturally to the discrete model. Just as it is necessary to 
place the classical wave model of radiation by the 


-* Received by the PGIT, November 2, 1959. This work was 
tially supported by the National Science Foundation Grant 
SF-G 9780. Publication was assisted by the Marcellus Hartley 


und. 

+ Dept. of Elec. Engrg., Columbia University, New York, N. Y. 
1C. E. Shannon and W. Weaver, “The Mathematical Theory of 
ommunication,”’ University of Illinois Press, Urbana, p. 67; 1949. 


quantum model to explain certain physical phenomena, 
so too is it possible to utilize the quantum model to give 
a more realistic picture of information transmission in 
those cases where quantum effects become important. 

Situations in which the quantized nature of the electro- 
magnetic field is dominant are becoming increasingly 
common. As communications systems move to higher 
frequencies and spread their power over wider band- 
widths, the average number of photons per unit time- 
bandwidth becomes correspondingly smaller. Thus, it is 
pertinent for the communication engineer to inquire into 
the limitations placed upon information rate by the 
photon nature of radiation. (A discussion of these effects 
in amplifiers of the Maser type will appear in a forth- 
coming paper.”) 

Gabor ** was apparently the first to introduce quantum 
effects into the derivation of information rates, deducing 
in a somewhat intuitive fashion an expression for channel 
capacity. His expression resembles Shannon’s expression 
for large signal and noise power. Furthermore, Gabor’s 
work appears to be unique in this area, although many 
physicists’’® have gone in the reverse direction, applying 
information theoretic principles to problems in modern 
physics. 

In this paper, the quantized nature of the electro- 
magnetic field is used to convert the continuous channel 
into an equivalent ‘photon channel,” and to derive some 
expressions for entropy and information rate in such 
channels. In Section II, the photon channel and source 
are defined. The maximum entropy distribution for a 
photon source with average power limitation is derived 
and discussed in Section III. In Sections IV and V the 
Poisson channel is considered. It is shown to resemble the 
continuous Gaussian channel for large average power but 
to behave quite differently for small average power. 


II. Tar PHoton CHANNEL 


In constructing a mathematical model for a photon 
channel, it is necessary to consider two fundamental 
relations of quantum mechanics: 1) the Planck relation, 
E = hf (where EF is the energy of a photon at frequency f 


2 T. E. Stern, “Information rates in photon channels and photon 
amplifiers,” to be published in the 1960 IRE Natrona CONVENTION 
RECORD. 

3D. Gabor, “Lectures on Communication Theory,” Mass. Inst. 
Tech., Res. Lab. of Electronics, Cambridge, Mass., Tech. Rept. 
No. 238; April 3, 1952. : 

4D, Gabor, “Communication theory and physics,’ Phil. Mag. 
vol. 41, no. 7, pp. 1161-87; 1950. 

5 L. Brillouin, ‘Science and Information Theory, 
Press, Inc., New York, N. Y.; 1956. Or telne che 

6D, M. MacKay, “Quantal aspects of scientific information,” 
IRE Trans. on Inrormation Turory, vol. IT-1, pp. 60-80; 
February, 1953. 


” Academic 


436 


and h is Planck’s constant), and 2) the uncertainty 
principle, At AH > h/2m (here At and AF represent un- 
certainty in time and energy respectively). Relation 1) 
essentially establishes a discrete set of energy levels in 
the channel, while 2) is automatically satisfied by appli- 
cation of the sampling theorem. This will become apparent 
as we proceed. 

Consider a continuous source transmitting over a 
bandwidth Af, with center frequency fo, such that 
Af/fo «1. The sampling theorem states that such a signal 
has 2Af degrees of freedom (DOF) per second. These may 
be expressed, for example, as amplitude and phase of the 
carrier at intervals of 1/Af seconds. For our purposes, 
however, it is more instructive to formulate the problem 
somewhat differently. Consider the channel as being made 
up of rectangular cells in time-frequency space, of di- 
mensions At and Af, where At = 1/2Af. Each cell repre- 
sents a DOF of the signal. The source transmits infor- 
mation by locating an arbitrary number of photons in each 
cell; the receiver is simply a photon counter. It can be seen 
by using relations 1) and 2) that the uncertainty principle 
is not violated by such a model. Although there is some 
question as to how close such a model may be approxi- 
mated in practice (Gabor, for example, bases his develop- 
ment on the fact that the source cannot determine precisely 
the number of photons in each cell), the above model 
serves as a useful point of departure in considering 
quantum effects. Since the system is assumed narrow- 
band, all photons have nominally the same energy, 
E = hf,. Observe that in this model, amplitude resolution 
improves with increasing power level and decreasing 
frequency. 

The photon source can now be defined by a discrete 
first-order probability distribution over the nonnegative 
integers. (Only zero memory sources will be considered 
here.) Having defined the photon channel, we are now ina 
position to derive expressions for the entropy of photon 
sources. 


Ill. A Maximum Entropy PHoTON SouRCE 


Consider a source with probability distribution p(n), 
the probability of locating n photons in a particular DOF, 
The distribution is assumed identical for each DOF. The 
entropy for this source is defined as’ 

H = — >> p(n) In p(n) (natural units per DOF). (1) 

n=0 
(For convenience, natural logarithms will be used through- 
out.) 

We now calculate the maximum entropy distribution 
for a narrow-band source under the average power 
constraint: 


> np(n) =N= a 


n=0 


photons/DOF (2) 


P = average power. 


7 C. E. Shannon, op. cit., see p. 19. 


IRE TRANSACTIONS ON INFORMATION THEORY 


‘ 
N, the “occupation number,” 


Septemb 


plays the fundamental ro 
in the derivations which follow. 
Using the additional constraint 


> ve) = 1, ( 


we maximize with respect to p(n) obtaining 


aaa | — 3 pin m pln) +9 32 meni) 


aor > vt | = 


where A and uw are Lagrange multipliers. Solving for p( 
and substituting into (2) and (3), we obtain 


—in(1 af i) 
1 


n= 0, dp 2igene Ss ( 


y= 
« 


p(n) = ae” where 


Beg 


Thus, the maximum entropy photon distribution und¢ 
the average power constraint is exponential. Substitutin 
into (1), we obtain for the entropy, 
— > ae 

n=0 


hye Mine(eet®) 


—a >> ena + An) 
n=0 


ny 


Meret Ina ade 
1—¢ (1 — e’)’ 
-imn(i+3 x) +n (+ 9). ( 


° 


It is of interest to examine the asymptotic behavior ) 
H. For N > 1, we have 


H = Iine(l + WN). ( 


As should be expected, this approaches In eN, t) 
maximum entropy of a continuous (exponential) dist! 
bution limited to the positive axis, with mean J. | 
Shannon has pointed out, the entropy of a continuo 
distribution has physical meaning only in a relative sen 
This becomes apparent in comparing the asympto} 
behavior of the discrete and continuous distributions | 
N — 0. The discrete expression goes to zero asym: 
totically: 


Hx~—-NinN-0 a NO 


The continuous expression, however, becomes negati 
for N < 1/e and hence is physically reser in 4) 

range. 

Eq. (6) can now be used to derive the photon dist 
bution in frequency corresponding to a maximum entra 
wide-band source. As will be shown, this distribution: 
similar to that for thermal noise. (The similarity of 4 
results in this section to statistical mechanical rest 


‘960 


hould be expected since the derivations are completely 
wnalogous to those of statistical mechanics.) Assume that 
he source is transmitting simultaneously and inde- 
yendently over an infinite number of narrow-band 
hannels of bandwidth Af. Let NV; = occupation number 
mean number of photons per DOF) at frequency f;. Then 


A; = N; In (1 ++)+ma + N,) 


a 


vhere H; = entropy per DOF at frequency f;. 


il foo} oo 
ge a (9) 
vhere H’ = total entropy per unit time. 
Expressing the average power constraint as 
i=1 
ve maximize subject to (10) to obtain 
a ae 1 x 
e— Wy A * i ] (1 +) 1 I, | 
2 {| f-2UN n Tas +ind+ N,) 
+2n Af > wan} E(t 12: 1) 
nd 
n (1 +4) + ny =40 
N; ‘ 
= 1 
N; = Rr ae (12) 
ro evaluate \ we substitute (12) into (10): 
EE. (13) 


J 2, eh — 
there f; = 7Af. Since Af may be chosen arbitrarily as long 
s the relation AfAt = 1/2 is satisfied. \ will be evaluated 
or the limiting case Af — 0. 


We have then 


mee Dhf, Af 
ea Sad bel 
joo 2 oe soe 


(a 


r 
ee | 


(total signal power). (14) 


olving for 2, 
ieee eee 
V/3Ph 
In order to draw an analogy with thermal radiation, 
‘e make the identification 


| 1 

r re kT, ) 
is oa /SPh, 
kr 


Stern: Some Quantum Effects in Information Channels 


437 
where & = Boltzmann’s constant and 7, = effective 
signal temperature. Substituting into (12), 

It 
NG) = photons/DOF. (15) 


tae (4) me 


Eq. (15) is a Bose-Einstein distribution and is character- 
istic of thermal radiation.* The exponent hf/kT, in (15) 
determines whether the system is in the ‘‘classical’’ 
(continuous) range, or in the quantum range. Specifically, 


h : : 

in <1 large occupation numbers (continuous) and 
hf ; 

LT > 1. small occupation numbers (quantum). 


Since conventional noise theory normally assumes 
large occupation numbers, no parallel can be drawn to 
existing formulations for the case of small occupation 
numbers. However, for the former case, it is of interest 
to examine the expression for signal power density. From 
(15) the average number of photons per unit time trans- 
mitted in the frequency interval (f, f + df) is 


2 df 


N’*(). df = 2NG@) df = photons/second. 


dP = N’(f)hf df = = ce F (16) 
cn (BE) = 
~ QkT, df f an <4 17) 
~ Eps TOL kT. <n ( 


Except for the factor of 2, (17) is identical to the Nyquist 
expression for thermal noise.’ Going from (16) to (17), the 
Bose-Einstein distribution was approximated for large 
occupation numbers by the Boltzmann distribution. It is 
not surprising that a signal resembling Nyquist (“‘white’’) 
noise should result in this case, since the Nyquist expres- 
sion is based on the assumption of large occupation 
numbers. The factor of 2 may be accounted for by noting 
that in this system we are transmitting all photons in one 
direction: from the source to the receiver. In a system 
excited by thermal noise, half the noise power is flowing 
toward the source and the other half toward the receiver. 

An important consequence of the consideration of 
quantum effects is that (16), the power density spectrum 
for a maximum entropy source, trails off rapidly to zero 
at high frequencies. In contrast, a wide-band source based 
on the conventional model would have a completely 
uniform spectrum, requiring that there be zero power 
over any finite frequency range. 


8 W. P. Allis and M. A. Herlin, ‘“Thermodynamics and Statistical 
Mechanics,’’ McGraw-Hill Book Co., Inc., New York, N. Y., p. 
221; 1952. 

9 Tbtds p> 107. 


438 
IV. Tue Poisson SourcE 


In the case of the conventional continuous channel, it 
is convenient to work with Gaussian’ distributions for 
many reasons, two of the most important being: 1) they 
have maximum entropy under the average power con- 
straint, and 2) they are additive; 7.e., a random process 
which is the sum of two Gaussian processes is also 
Gaussian. Unfortunately, the exponential photon dis- 
tribution, which was shown to possess maximum entropy 
under the average power constraint, is not an additive 
distribution. The Poisson distribution, however, does 
have this property. Since additivity is a great convenience 
in calculating information rates, and since the Poisson 
distribution has other interesting and useful properties, 
some characteristics of the Poisson source will be explored 
in this section. 

Consider a photon source with Poisson distribution: 


pny = Ne ne Ol) 2) -=./(photéns/DOF): ? (18) 
As is well known for this distribution, 
mean, N =), (19) 
variance, o = NX (20) 


and, if P3;(n) is the distribution of the sum of two in- 
dependent Poisson variables with means ), and 2, then 


Ne —ha 


P3 = 
n! 


(21) 


where A3 = Ay + Arg. 
The entropy of the Poisson source can be calculated in 
a straightforward manner as follows: 


H(i) = — Y pln) In ve) 


o 


= —e* M [ing tnimr—a| 


etl 


LE) 


— DB2 EY re 


N _ n! 
Sars ees | 


oar 
N* In6 v” 
a oy. pT na] (22) 


where in the last line, \ has been replaced by NV, to keep 
the notation for the average number of photons per DOF 
consistent with Section ITI. 

The entropy for the Poisson source cannot be expressed 
in closed form. However, most of its useful properties 
can be deduced by examining its asymptotic behavior for 
N <1 and N > 1. We observe from (22) that 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe: 
H(Poisson) > — NinN — H (exponential) 
for. NV <1.- Ca 
For N > 1 we note the following relation between the 
Poisson and Gaussian distribution:”° 
Nor 


a@)1/2<(n—)d) <BA1/2 n! 


(24 


— &(@) — Ba) as A> & 


where ©(zx) is the Gaussian cumulative distribution fune 
tion with unity variance and zero mean. 

Eq. (24), a result of the central limit theorem, 1 
essentially a statement of the fact that for sufficiently 
large \, the envelope of the Poisson distribution is approxi 
mated arbitrarily closely by the Gaussian density functior 
with mean and variance \. Fig. 1 is a comparison of the 
two for NV = \ = 10. Using Shannon’s expression for the 
entropy of a Gaussian source, we have 
H (Poisson) — 1/2 1n2re + 1/2 InN 

= H(Gaussian) for NV > 1. (25) 


p (n) 


Gaussian 
p(n)= Ta) 


Poisson 


Fig. 1—Comparison of Gaussian and Poisson distributions. 


A comparison of the entropies for the exponential, Poissa 
and Gaussian sources is shown in Fig. 2. (The Gaussi 
source has a continuous amplitude distribution wit 
mean and variance NV, while the other two are discret 
photon sources with mean NV.) In Fig. 2, the entropy fe 
the Poisson distribution is EDoresandeted by a parti 
summation of the series to the point where all furth 


10 Wi Feller, “An Introduction to Probability Theory and J 
vate ae John Wiley and Sons, Inc., New York, N. Yaa 


960 


arms become negligible according to the computational 
theme used. This results in summing approximately 
VY + 7N)”? terms. Note that the asymptotic expressions 
iven in (23) and (25) offer excellent approximations for 
ne entropy of the Poisson source in the ranges N < 0.4 
nd N > 4. Note also that for large N the entropy of 
ne Poisson source becomes approximately one-half the 
laximum entropy obtainable under the average power 
onstraint. Since quantum effects have been taken into 
ecount in the calculation of the entropies of the two 
iscrete sources, they remain positive for all V. However, 
ntropy of the Gaussian source becomes negative for 
| < 1/2re. 


—— H ( Poisson ) 


——-—- H ( Gaussian ) (with) mean = variance) 


-—-— H ( Exponential ) 


Jl | MEAN 10 50 


Fig. 2—Entropies as a function of mean, }. 


V. THE Poisson CHANNEL 


We are now in a position to consider the transmission 
ate through a photon channel in the presence of additive 
oise. (A more interesting type of nonadditive noise will 
e considered in a forthcoming paper.) In order to make 
ae calculations tractable, it will be assumed here that 
1e e signal and noise are independent Poisson processes 
ith means, N, and N,, respectively. The channel will 
Iso be assumed to have a narrow bandwidth Af. Under 
e above conditions, we may express the transmission 


1 
ste Ras 
= H(Signal + Noise) — H(Noise) 
(Natural units per DOF). 
hus 


= H(Poisson, V, + N,) — H(Poisson, N,). (26) 


Stern: Some Quanium Effects in Information Channels 


439 
Using (25), we may write 
Rw 1/2 In (1 + Ne) for N,>1. 
Since 2Af DOF are transmitted per second, 
Ros Af in (1 + §) (natural units per second) (27a) 


where S/N, the signal-to-noise power ratio has been 
substituted for N,/N,, the signal-to-noise occupation 
number ratio. Eq. (27a) will be recognized as Shannon’s 
expression for the capacity of the continuous additive 
Gaussian channel. One obtains quite different results 
with small occupation numbers, however. A list of some 
cases of interest follows: 


R! ay [Qmre(N. + N,N] N,>>1,N,K1, (27b) 


R2= ot in (QreN,.) N,>1, N,—0, (27¢) 
Toy |. ASN +4) va 
NG Nome p27) 
and 
R’ 50° -N,/N, = constant, “N,—=0.  @7e) 


It is interesting to review (27a)—(27e) observing howand 
when they differ from the expression for the analogous 
continuous channel. A summary of the notable character- 
istics 1s given below: 


1) R’ approaches the continuous expression for large 
noise occupation numbers. [See (27a).| 

2) R’ remains finite for all finite signal occupation 
numbers no matter what the noise power is. [See 
(27c¢).] 

3) R’ goes to zero when the signal occupation number 
is zero no matter what the noise power is. [See 
(27e).] 

4) R’ approaches channel capacity C for small signal 
and noise occupation numbers. (Assuming the 
average power constraint.) 

We deduce 4) quite easily. Channel capacity C is 

defined as 
C = max PR’ 


p(n) 


where p(n) = source probability distribution. 

Since H(Noise) is independent of p(n), channel capacity 
is attained when H (Signal + Noise) is maximized. Clearly, 
the maximum is attained when H(Signal + Noise) 
corresponds to an exponential distribution. But for small 
occupation numbers, H (Poisson) —> H (Exponential) ; thus 
the maximum is approached asymptotically in the Poisson 
channel for small occupation numbers. 


440 


VI. ConcLusIons 


In order to circumvent some of the spurious results 
which are obtained by using the continuum as a physical 
and mathematical model for an information channel, a 
discrete photon channel has been postulated. For such a 
channel, the maximum entropy source under an average 
power constraint has been shown to have an exponential 
photon distribution with a Bose-Einstein frequency 
dependence. By use of the Bose-Einstein statistics, an 
analogy has been drawn between the maximum entropy 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


source and wide-band thermal radiation at an equivalen 
temperature 7',. In contrast to continuous sources, th 
entropy of the discrete source has been shown to be well 
behaved for small occupation numbers and _ infinit 
bandwidths. 

The Poisson source has been used to investigate trans 
mission rate in the presence of additive noise. The result 
have been shown to approximate those of Shannon for th 
continuous Gaussian channel in the case of large occu 
pation numbers, and to be more in keeping with physica 
limitations for small occupation numbers. 


Spectral Analysis of a Process of Randomly 


Delayed Pulses” 


M. V. JOHNS, JR.+ 


Summary—Formulas are obtained for the steady-state covariance 
function of a process of pulses separated by independent random 
time delays. The pulses considered may be either stochastic or 
deterministic in character. Since such processes approach station- 
arity in time their steady-state spectral density functions may be 
obtained. Explicit results are given for two examples. 


I. INTRODUCTION 


HE purpose of this paper is to obtain formulas for 
| va covariance and steady-state spectral distri- 
bution functions of a process consisting of a sequence 
of pulses separated by random time delays. Such a process 
will not be precisely stationary in time but will approach 
a steady-state condition susceptible to spectral analysis. 
The wave forms of the pulses are represented by the sample 
functions of a sequence of independent stochastic processes 
defined over a finite time interval corresponding to the 
pulse width, and having identical mean and covariance 
functions. The time delays are represented by independent 
identically distributed non-negative random variables 
possessing a common density function satisfying certain 
mild restrictions. The derivation of these formulas makes 
essential use of certain recent developments in renewal 
theory, a summary of which may be found in [5]. 
Processes of the type described above arise in com- 
munication theory in connection with the analysis of 
asynchronous multiplexing systems such as those dis- 
cussed in [4]. In such multiplexing systems, a number of 


* Received by the PGIT, November 10, 1959. Work sponsored in 
part by the Office of Naval Research, Contract No. N6onr-25140. 
+ Dept. of Statistics, Stanford University, Stanford, Calif. 


unsynchronized transmitters inject signals into a commo: 
medium to which a similar number of receivers ar 
coupled. The signal emitted by a typical transmitte 
consists of a sequence of pulses separated in time b, 
means of a random delay mechanism. Each pulse contain 
a fragment of the message to be transmitted and possesse 
features which identify the transmitter. The mathematics 
model developed in detail in Section II attempts 
represent the output of such a transmitter closely enou 
for purposes of spectral analysis. 


II. THe MatrHematicat MopEL 


The structure of the stochastic process X(t), t > ¢ 
whose covariance and asymptotic spectral distributi 
functions we wish to obtain, is defined in terms of tl 
following quantities: 

Let the random variables T;, 7 = 1, 2, --- , be it 
dependently and identically distributed with comm 
distribution function F(t), such that F(0) = 0 a 
F’(t) = f(t) exists, and let » = HT; < ©: The W, 


a 


represent the random time delays between successi* 
pulses of the process X(t). Let 


Si = oS Pes n= 

and, for t > 0, let 
NiO emaxs [eS eee eh 

U(t) =t- Sy ( 

and | 


Vi) = Syaysi — t. 


L960 


Che study of the properties of these quantities is a stand- 
ard part of renewal theory. Let Y,(t),0 <t<c,i = 1, 
2, --+ , be independent stochastic processes representing 
the successive pulses (of time length c) of the X(é) process. 
In particular instances, Y;(t) might be assumed to be a 
Jeterministic function giving the form of the 7th pulse, 
or it might be assumed to be partly stochastic.) We assume 
chat Y;(t) = O for ¢ not in the interval [0, c]. We further 
assume that the functions é(¢) and ¢(s, #), defined for 
mt = O by 
E(t) = EY;(t) (5) 
and 
g(s, t) = EY,(s)Y,(), 


ire finite valued, measurable and independent of 7. 
We may now represent the X(t) process as follows: 
Hor ¢ > 0, 


(6) 


X() = Yuu lU]. (7) 


Phe quantities defined above are illustrated in Fig. 1. 
Mhis general formulation should be contrasted with that 
sonsidered by Fortet in [2]. 


X(t) 


Peon ima! ont 


== U0 WA 


Higa: 


II. THe CovARIANCE AND SPECTRAL Density FUNCTIONS 


We first compute formulas for the covariance function 
£ the X(t) process and then, noting that X(t) is asymp- 
otically stationary, we may compute its asymptotic 
or steady-state spectral density function. We observe that 


BU) <u} = Plt — Syay <u} 


Me Rit eS, << t, Sarr >t}, 0 <u < 1, 

rast n=1 

ee es u= t, 

ig a [1 —Fe— ylaF@), O<uSt, 

at (8) 
F(t), U=t, 


vhere F” represents the n-fold convolution of F. 
Now, for 0 < wu < #, let the probability density function 
f, U(t) be given by 


gw) = © PLU <4} 


=f Fe) DFP», (9 


Johns: Spectral Analysis of a Process of Randomly Delayed Pulses 


441 


provided that the sum on the right converges uniformly 
in u (which we will assume), where f” represents the 
derivative of . We note that the probability of the 
event U(t) = t is the probability that N(é) = 0 (¢.e., that 
T, > t) and is equal to 1 — F(t). For 0 < u < t, let the 
conditional probability density function of V(s), given 
that U(s) = u, be given by 


go |) = £ PLVO) <0| UW) = 4) 


aa a | Fe yy = He | 


1 — F(u) 
NDEs wt) 
i) oo 


Then, letting g,(v, wu) = g(v | u)g.(u), we see that g,(v, «) 
represents the joint probability density function for V(é) 
and U(t) everywhere except for points where U(t) = f, 
for which no density exists. Now 
E{X(s)X(s + 0} 

= E{X(s)X(s + d | Vis) S$ YPLV® < ¢} 
+ E{X(s)X(s + ) | Vis) > HP{V(@) >t}. CD 


If we designate the first and second terms of the right- 
hand side of (11) by E,(s, t), £.(s, t), respectively, we have 


E,(s, t) = B{E[X()X(s + #) | US), Vis] | VO < 3} 
P{V(s) < #} 
= E{E[X(s) | Us), VO) 
E[X(s + ) | US), VO] | VO < 3B 
PIV) So} 


since X(s) and X(s + ¢) are conditionally independent 
given U(s) and V(s) when V(s) < ¢. Furthermore, X(s) 
and V(s) are conditionally independent given U(s), so that 


B{X(s) | Us), Vis)} = B{X@ | UE)} 
= E\Yy(U(s)) | U®} 


(12) 


= &(U()). (13) 
Similarly, for V(s) < ¢, 
E{X(s + t) | Us), V(s)} 
E{X(s + 4) | V9} 
= E{E[X(s + ) | Us + d, VS] | VO} 
= E{E[X(s + ) | Us + 0) | VO} 
= ELEY xcsen(Us + d) | Uist 0) | VO} 
= H{e(U(s + %)) | Vis)}. (14) 
INow, for 0: < 0c, OS o Se 9, 
PiU(s + )-< «| V@Q =v} = P{UG—v) <a}. (15) 


442 
Hence, from (8), (9) and (14), for0 < v <4, 


E{X(s+ # | Vis) = 2} 
= [  o.@) de + x0 — FE-v). 0 


Hence, from (12), (13), (14) and (16), 


Bys,) = [ iL eo [  eigagrde 


ot =o) pL i— whoo, u) dv du 
+20 [ . { [ neo. 


SO Llp (te nih — F(s)|g@ | 8) dv. (17) 


Now the only case for which F,(s, t) may be nonzero is 
when 0 < ¢ < ¢, since otherwise X(s + ¢) is certainly 
zero. For 0 < t < ¢, we have 


E,(s, t) = E{E[X(9)X(s + t) | U®, Vis] | Vis) > 4} 
‘P{V(s) > t} 
= FRE[Y y(U(8)) Yn (U® + t) | UO] 
| Vis) > BYP{V® > 2} 
= E{e(U(), US) + | ; Vs) > 8 
-P{V(s) > ¢} 


= [ cuu+ t) i g:(v, u) dv du 


tost+ oi - FO) | gol) a. (18) 
From (6) we see that the above expression is automatically 
zero if ¢ > c so that it holds for all ¢ > 0. We have now 
obtained the formulas necessary to compute HX (s)X(s + 2) 
and need only to find an expression for X(t) in order to 
be able to compute the covariance function of X(t). But 


EX(t) = E{E(X(t) | U(O)} 
= E{E[Yw(U@) | UO)} 
= E{t(U@)} 


e. i ‘ewg.de) du + E01 — FO]. (19) 


It should be noted that a sufficient (although certainly not 
necessary) condition for the finiteness of the integrals 
appearing in (17), (18) and (19) is that the functions & 
and ¢ be bounded. 

By definition, the covariance function R(s, t) of X(t) is 
given by 


R(s, t) = EX(s)X(t) — EX(S)EX(2). (20) 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


Hence, 
R(s,s + t) = E\(s, .) + E,(s, ) — EX(9EX(s + 0), (21 


which may be computed from (17), (18) and (19). Fo 
purposes of spectral analysis, however, we wish to tak 
advantage of the asymptotic stationarity of the X( 
process and obtain the limiting covariance function 


R*() = lim RG, s + 2d. (22 


To this end, we first note that a well-known theorem 0 
renewal theory [5] assures us that the so-called “renewa 
density’? h(x) of the 7',’s defined by 


h(a) = > f?@) (23 
has the property that 
lim h(x) = - (24 


provided only that f(z) — 0 as x — © and |f(a)|? i 
integrable for some p > 1. We assume henceforth tha 
these conditions are satisfied. Now, from (9) we have | 


gu) = [1 — FWA — wy), ) 
so that for each u 
lim glu) = = (1 — FOO), (26 
and, similarly, from (10), 
lim g.(o, w) = + fv +1). (2% 


Hence, noting that the second term of the right-hand sic 
of (17) vanishes for s > c, we have 


lim 6,0) =4 f° [ eens +0 


30 MK 


a e(a)[1 — F(a)|a(t — » — 2) dx do du | 
| 


= : a [ E(u)e(t — v)[1 — F(t») ]f + w) do du. @ 
Similarly, from (18), 


ieee i wee » | * fe Ean quan 


an gu,ut+t dl — Féi+twidu, @ 


and from (19), 


lim HX(®) = : i EGTA = eh) ae 


It is easily verified that a sufficient condition for t) 
validity of the interchange of limits and integratio 


1960 


performed in obtaining the above expressions is again 
hat £ and ¢ be bounded functions. Hence, finally, noting 
that the functions £ and ¢ vanish when their arguments 
fall outside the interval [0, c], we may write 


Re!) = : ik f “EQI + 0) 


min(t—»,c) 
= ; t(a)[1 — F(2)]h(t — » — 2) dx dv du 


fi 


ah en _. WEE — LL — FO — 0 + w) do du 


max(0,e—t) 
a gu,u+ {1 — F(t + wv] du 


— Bair é(u)[1 — F(u)] anh 


The asymptotic spectral distribution function F*’ of the 
process X(t) is obtained from R* by means of the usual 
formula 


me (A+) + F*(A—)] — F*(0) 
eae sin 27) on! 


4 és 


(31) 


eee tt Gab: A 20; (32) 

which may be found, for example, in [1]. If it exists, the 

spectral density function f* = F*’ is given by 
ftQ) = 4 | REC cos 2edt dt. (33) 

0 

It can be shown [3] that the spectrum of an asymptotically 

stationary process has the same interpretation as in the 

case of a strictly stationary process. 

Example 1: To illustrate the application of the formulas 
derived in the foregoing section, we consider the particular 
case where the pulse delay times have exponential distri- 
outions and the pulses are rectangular with unit amplitude 
and time duration c: Let 


ee) 


ete 
Ee { an (34) 
0, be 
0 that 
1 -t/ 
Oe 20 
{® = 4n , (35) 
0, J, 
‘or this case, it is easily verified that 
h(a) = ; So; (36) 
Ve assume that for each n, 
SESS 
V,() = { ee! (37) 
0, otherwise 


Johns: Spectral Analysis of a Process of Randomly Delayed Pulses 


443 
with probability one, so that 
ie pean ete 
(1) = oes (38) 
0, otherwise, 
and 
< < 
ble te : Onesie (39) 
0, otherwise. 


Elementary computation shows that for this case (31) 
becomes ° 


Rte ee 


R*(i) = f 
0, 


and the corresponding spectral density function given by 
(33) is 


Gee) 
? 


< 
Ossie (40) 


ib = Op 


Ae 


BS eras 


—c/p 
{ <___ sin Qedc — pe" cos ame}. (41) 


PS Sik 
The functions R*(t) and f*(A) are shown in Figs. 2 and 3 
for the case c = 1, w = 5. 


As) 


O05 


ie) 5 1.0 


Fig. 2—R*(é), example 1. 


0 5 1.0 15 i 
Fig. 83—f*(A),"example 1. 


In many cases, the function h(x) cannot be expressed 
in closed form, so that it is impossible to obtain a closed 
expression for R*(t). Another example for which the func- 
tion h(x) does have a closed form expression follows. 


444 


Example 2: Suppose that the probability density func- 
tion for the delay times is given by 


(= te. (42) 
0) 
For this case, it is easily verified that 
he) = at PRO (43) 


If the quantities Y,,(¢), &(t) and ¢(s, t) are as in (87), (88) 
and (39), then a rather involved computation shows that 


2 
qaeria | ae ia otf) ceed ok °) ; t = C 
m 


Re = 
ies) ee C, 
Lu 
(44) 
and 
ec 4 Qu 
*/ ee 
ffFQ) = eG) w+ 2+ 77 
e **/* cog 2m)c Qu } 
~ 1+ Gray)? {. ea Se Sh 
_ e*/* sin Qrde \ og 4c i 2 \ 
7 pee ee aN) | Le 
20"e —4c/u s Y 
oo ee Gat {m sin 2rdc — te 2ane} (45) 


The functions R*(¢) and f*(A) are shown for this example 
in Figs. 4 and 5 for the case c = 1, uw = 5. 


ACKNOWLEDGMENT 


The author is indebted to Mrs. Ann Hillier and Mrs. 
Barbara Miller for performing the calculations necessary 
for Figs. 2-5. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


Fig. 4—R*(t), example 2. 


0 5 10 15 20 25 30 


Fig. 5—f*(A), example 2. 


BIBLIOGRAPHY 


[1] J. L. Doob, “Stochastic Processes,’ John Wiley and Sons, Ine) 
New York, N. vs 1953. 

[2] R. M. Fortet, ‘ ‘Average spectrum of a periodic series of identiez 
pulses randomly se ae and distorted,” Elec. Commun., vo: 
31, no. 4, pp. 283-287; 1954. 

[3] E. Parzen, “Asymptotically Stationary Stochastic Processes, 
(unpublished). 

[4] J. R. Pierce and A. L. Hopper, ‘“Nonsynchronous time divisio 
with holding and with random sampling,’ Proc. IRH, vol. 40 
pp. 1079-1088; September, 1950. 

[5] W. L. Smith, ‘Renewal theory and its ramifications,’ J. Roya 
Statistical Soc., ser. B, vol. 20, no. 2, pp. 243-284; 1958. 


960 


IRE TRANSACTIONS ON INFORMATION THEORY 


445 


Binary Codes with Specified Minimum Distance* 


MORRIS PLOTKIN 


Summary—Two n-digit sequences, called ‘‘points,” of binary 
ligits are said to be at distance d if exactly d corresponding digits 
ire unlike in the two sequences. The construction of sets of points, 
alled codes, in which some specified minimum distance is main- 
ained between pairs of points is of interest in the design of self- 
shecking systems for computing with or transmitting binary digits, 
he minimum distance being the minimum number of digital errors 
‘equired to produce an undetected error in the system output. 
“revious work in the field had established general upper bounds 
vr the number of n-digit points in codes of minimum distance d 
vith certain properties. This paper gives new results in the field 
n the form of theorems which permit systematic construction of 
odes for given n, d; for some n, d, the codes contain the greatest 
s;ossible numbers of points. 


Y the use of redundancy, it is possible to encode 
messages for transmission in such a way that 
errors in transmission may be corrected, provided 

shey are not too dense. For the special case of transmission 
xy means of binary digits, with fixed-length words, this 
yaper investigates the relationships among word length, 
aumber of words in the code and number of errors in a 
vord that can be corrected. The best codes known, with 
respect to these relationships but not to mechanizability, 
are given in Table I. 

In this paper, n-digit binary numbers are regarded as 
doints in an n-dimensional space. The word “point” 
Jenotes a binary number, or more accurately, a sequence 
of binary digits, since the ordinary arithmetical properties 
of binary numbers are not utilized." Two n-digit points 
ire said to be “at distance d’’ if they differ in exactly d 
sorresponding digits. For example, the points 


1011101000 
and 
0111001001 


ire at distance 4, the first, second, fifth, and last digits 
5eing different for the two points. A set of n-digit points is 
salled a “‘code of minimum distance d” if each point is at 
listance at least d from every other point of the set. The 
j-digit points 


000000 010101 
111000 101101 
100110 110011 
011110 001011 


* Received by the PGIT, November 20, 1959. The work leading 
‘0 this paper was sponsored by the Burroughs Corp. 

+ Auerbach Electronics Corp., Philadelphia, Pa. : < 

1R. W. Hamming, “Error detecting and error correcting codes, 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 


TABLE I 
Seas Seen ean ie ao ns aes POOR OO ae Meo ia 
A(n, d) D219) WO) Aad eS Gara 
ae Oiotihe 5 eee 22 24 6* 12* 24* 
2 2 2 4 6* 12* 24* 
Grad oe COND SING Qu 
DO Ae 16 Qu 
d=2.24 8 16 32 64 27 28 29 Q0 Qu go 913 Qu gis 
24 8 16 32 64 27 28 29 20 Qu gi O13 Qu 915 O16 
S S S 
ll ll ll ll 
sH ioe) N co 
re re 


* These values differ from the corresponding values of B(n, d). 


form a code of minimum distance 3, as may be verified by 
comparing them pairwise. It is convenient to regard a set 
consisting of a single point as a code of minimum distance 
d for every positive integer d. 

Clearly, for every ordered pair (n, d) of positive integers: 
there is some maximal number A(n, d)’ of n-digit points 
which might be selected to give a code of minimum dis- 
tance d. The code exhibited above demonstrates that 
A(6, 3) = 8. It will be seen later than there does not exist 
a set of nine 6-digit points at a distance 3 or greater pair- 
wise. The A(n, d) notation describes this situation by the 
equation 

A(6, 3) = 8. 


Both the present paper and one by Hamming’ are 
concerned primarily with properties of the function 
A(n, d). [Hamming’s paper would not be very different 
if he had used A(n, d) instead of his B(n, d).| Interest is 
attached to this function by reason of its connection with 
coding schemes for correction of errors in systems employ- 
ing binary symbols for handling information. Consider 
a system for computing or transmitting n-digit binary 
numbers, and having the property that noise or system 
malfunction will affect at most x of the n-digits in any 
output number. There can be selected A(n, 2x + 1) but 
no more n-digit numbers which form a code of minimum 
distance 27 + 1. If the system can be designed or its 
operation programmed in such a manner that correct— 
1.e., error free—operation will give rise to outputs con- 
sisting exclusively of numbers in the code, then the 
correct outputs will always be deducible from the actual 
output. There will always be exactly one code number at 
distance x or less from an output number. For example, 


2 This definition for A(n, d) is not the same as Hamming’s defi- 
nition for his B(n, d), in that a somewhat less restrictive class of 
codes is used here. B(n, d) < A(n, d) for all n, d. B(n, d) is always 
a power of 2; A(n, d) need not be. The departure is for convenience 
only and does not constitute a significant innovation. 


446 


if x = 1 and n = 6, there could be used the code exhibited 
above to demonstrate that A(6, 3) > 8, since d = 3 = 
2x + 1. An output of, for example, 101001 in such a 
system could be “corrected” to 101101, that being the 
code number at distance 1 or less from the actual output. 

Following is a summary of Hamming’s results, which 
are utilized in the present paper: 


A(n, 1) = 2", 
A(n, 2) = 2""*, 
A(n + 1, 2k) = A(n, 2k — 1), 
OF. 
ae ea Cone a Cink oa) 
where 
ni 
CO.) = FG, = hy! 


Except for the unimportant difference between A(n, d) 
and B(n, da), all definitions and results to this point are 
due to Hamming. 

Definition: By the sum a * b of two n-digit points a, b is 
meant that n-digit point whose jth digit is zero according 


unity 
as the jth digits of a, b are the same 
different. 
For example, 
if a is 1011101000 
and 6 is 0111001001 
then a * b is 1100100001. 
For any a, a * ais the origin or null-point 00 - - - 0, denoted 
throughout by o. 
Definition: |a| = m means that exactly m of the digits 


of the point a are 1. 1n this notation, the distance between 
two n-digit points a, b is |a * bj. 

It is clear that addition as defined above is associative: 
that (a * b) *c = a* (b*c). If K isa code of n-digit points 
a, b, c, +++ of minimum distance d, then so is the code 
denoted by K * x consisting of the points a * a, b * 2, 
c * x, -++ where x is any n-digit point. For pairwise 
distances are preserved, since 


| (a * x) *(b+z)| = | (a* b) *@* a) | 
= |{(a@*b)*0|=|a*b]. 
Theorem 1: If 2d > n, then A(n, d) < 2m < 2d/(2d 


—n), m an integer. 

Proof: Let K be any code consisting of A n-digit points 
of minimum distance d. Let h be any point in K. Consider 
the code K * h, as defined by the notation of the preceding 
paragraph. Since h * h = 0, o will be a member of K * h. By 
the minimum distance property it follows that the other 
A — 1 members of K * h each have at least d digits equal 


3 Hamming’s proofs that the relations hold for B(n, d) are valid 
without change for A(n, d). 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


to 1, so that the sum of |k * h| over all k in K is at leas 
(A — 1)d. This is true for each of the A possible choice: 
of h. Letting h also run through all possible values we finc 
that the total number N <f 1’s in the A’ possible sums ©: 
two points k * h for h, k both in K, must satisfy 

A(A — 1)d < N = Dy [h* Kl, 
the sum over all ordered pairs (h, k). 

Next, we obtain another inequality on NV by considering 
corresponding digits of each point in K. Suppose x points 
in K have for their first digit 1 and the other A — z have fot 
their first digit 0. In the A” sums k * h, exactly 2(A — @) 
will have for their first digit 1. If y, z, --- are defined ir 
similar manner for the second digits, third digits, --- of 
the points in K, the same number N = )> |h * &] is seer 
to be expressible as 


N = 22(A — x) + 2¥(A — y) + 2e2/A -24+-- 
Case 1): A = 2m. Each of the terms 
2x(A — x), 2y(A — y), ete., 
is at most A’/2. Since there are n such terms, 
N < nA?/2. | 
Combining this inequality with A(A — 1)d < N, 
2A — Dd < An 


and 
(2d—nA < 2d. 
Since 2d > n by hypothesis, 
cei 
2d =n 
A = 2m — 1. Each of the terms 
24(A — 2), 2y(A — y), etc., 


is at most (A* — 1)/2. Continuing as in Case 1), it maj 
be seen that 


A-= 2m < 


Case 2): 


2d 
2d—n | 

Corollary: A(n, n) = 2. By the above theore 
A(n, n) < 2, and the pair 00---0, 11---1 We 
example showing A(n, n) > 2. 

Corollary: A(4m — 1, 2m) < 4m and A(4m — 2, 2 
< 2m. 

Theorem 2: A(n,d) < 2A(n — 1, d). 

Proof: Let K be a code of A(n, d) n-digit points ¢ 
minimum distance d. Separate the points of K into t 
sets according to their first digit. At least one of t 
two sets will contain one-half or more of the points. Dé 
letion of the first digit in each point of that set leaves. 
code of minimum distance d containing at least 


A(n, @) 
2 


(n — 1)-digit points. This proves the theorem. 


A=2m—1< 2m < 


Corollary: Since A(4m — 1, 2m) < 4m, we have 
A(4m, 2m) < 8m. 

)Also, if A(4m, 2m) = 8m, then A(4m — 1, 2m) = 4m and 
A(4m — 2, 2m) = 2m. 


Theorem 3: If4m — 1isaprime, then A(4m, 2m) = 8m. 
Proof: Since we have shown A(4m, 2m) < 8m, it is 
‘sufficient to construct a code of 8m 4m=digit points of 
| minimum distance 2m.* One such construction is included 
in the Appendix. 
Theorem 4: A(2n, 2d) > A(n, 2d) A(n, d). 
Proof: For this proof only the symbol ~~ denoting con- 
‘catenation of two sets of symbols is introduced. Its 
“meaning is illustrated by the example: 


If ais 1011 
and 6 is 00111, 


then ab is 101100111. 


Clearly, |ab| = |a| + |b| for any a, b. 
Let L be a code of minimum distance d containing 
A(n, d) n-digit points, and M a code of minimum distance 
2d containing A(n, 2d) n-digit points. From these will be 
constructed a code K of minimum distance 2d, containing 
A(n, d)A(n, 2d) points. This will prove the theorem. 


Define K as the set of all points u = (aa) * (bo) for ain 
L and b in M, o being the n-digit null point. The points 
uw will of course be 2n-digit points. There are A(n, d) 
A(n, 2d) distinct pairs a, b. If it can be shown that two 


distinct pairs a,, b, and a», b, give rise to points w, Us, . 


in K at a distance at least 2d, the theorem is proved. 


Lat a —™~ 
Uy * Uz = (a; a, * b, 0) * (Az Ay * by 0) 
a a —~ ON 


= (a; ; * M2 Az) * (b; 0 * by O) 


La teins Gaal) 2. (ba b,) 0}: 


For a, b, and dp, b, distinct pairs, three cases may occur. 


1) a; = a, b; ¥ bz. In this case 


Za Cass 


| 0 0 * {(b; * bs) o} | 


Ju * U2 | = 
= | (b, * b.) o| =| b, * by |. 


But, by hypothesis, b,, b, are members of code M of 
minimum distance 2d, so that in this case |u, * w.| > 2d. 


4 Since the original writing of this report, the author has learned 
that such codes are a special case of a more general class that may 
be constructed by methods given by R. E. A. C. Paley, “On Orthog- 
onal Matrices,’ J. Math. and Phys., vol. 12, pp. 311-3820; 1933. 
By virtue of Paley’s work, Theorem 3 may be stated not only 
for 4m — 1 prime, but for 4m — 1 of the form 2*(p* + 1), p an odd 
prime and h, k integers. 


Plotkin: Binary Codes with Specified Minimum Distance 


447 


2) “Oy = ay, 2b) = 05. In.this. ease 
bad —~ aa 
| Uy * Us | = | {(d1 * Gy) (ay * ay)} * {0 o} | 


aa | (a, * 2) (ay * A) | 
oS 2 | ay * Ae | 
and since |a, * a,| > d by hypothesis, again |u, * u.| > 2d. 


3) a, ¥ Qs, b; ¥ bo. In this case we write 


| w, * Ue | = | (G «ay = (Geebay ene soe 
= | (a * ap) * (b, * Bb.) | + | a, * a |. 
For any 2, y, |r*y| = |x| — lyl, or 


|c*xy|+ly| 22. 
Therefore, |w,*u.| > |b: *b,| > 2d. 


Theorem 5: If A(4m, 2m) = 8m holds for m = x, then 
it also holds for m = 2z. 

Proof: A(8m, 4m) > A(4m, 4m)A(4m, 2m) = 2A(4m, 
2m). Also, A(8m, 4m) < 16m by the corollary to Theorem 
2. Therefore, A(4m, 2m) = 8m implies A(8m, 4m) = 16m, 
which was to be proved. 

Theorems 3 and 5 together prove that A(4m, 2m) = 8m 
holds for a number of values of m. In m < 20, the values 
which are not reached are 7, 9, 13, 14, and 19. I know 
of no m for which I can show A(4m, 2m) # 8m. This 
suggests the conjecture A(4m, 2m), = 8m for all m. 

From Theorems 1 to 5, in conjunction with Hamming’s 
results, there may be deduced for a number of n, d the 
exact value of A(n, d). With few exceptions, these n, d 
lie in the region 2d > n. This is the region in which 
A(n, d) < 4d and one would expect it for that reason to 
be the least interesting region from the point of view of 
practical applications. The known values of A(n, d) for 
d < 8,n < 16 are shown in Table I. Corresponding values 
of B(n, d) are given in Table II. The bottom two rows, 
d = 1 and d = 2, are given by Hamming; and values 
for n, d = 3, 3; 4, 4; 7, 3; 8, 4; 15, 3; 16, 4 are special 
cases of Golay’s formula.” No single method can be 


TABLE II 
Da Be Siie NOTE RE mie Yaa aa Ba i at ihc) 
B(n, d) 2 Wa OD AAS eaeiG ar 
Gres 62196 Re Boal 2 29 ge eA nie eel 
pe aoe Gouge tba 

d=4 J 22 A 8 A661G.-32). 64 127 28 29 omy on 
22 4 8 16 16 32 64 27 28 29 gio gu gu 
d=2. .24 8 16 32 64 27 28 29 210 Qu Qu gis Qu gis 
2 4 8 16 32 64 27 28 29 Qo Qu 9% 918 Qu QI5 QI6 

Ss s Ss 
T I T T 
2 a 2 


5M. J. E. Golay, ‘Notes on digital coding,’ Proc. IRE, vol. 
37, p. 657; June, 1949. 


448 


prescribed for finding the values given for different n, d. 
To illustrate the procedures 4it will be shown that 
A(18, 8) = 

Because 8 — 1 = 7 is a prime, A(8, 4) = 16 by 
A(4m, 2m) = 8m. This in turn implies that A(8m, 4m) = 
16m, or A(16, 8) = 


A(14, 8) , AMS, 8) . AU, 8) 


= 4 
2 a + ee 8 


A(13, 8) = 


by Theorem 2. By Theorem 1, 


16 
AIS 8) 2m So es 


or A(13,8) S 4 
Combining the two inequalities, A(13, 8) = 

For n > 2d, although they do not provide exact values 
of A(n, d), Theorems 1 to 5 may be useful in obtaining 
bounds. Again, the method chosen will be different for 
different n, d. For purposes of illustration the case n = 26, 
d = 6 is discussed. 


A(26, 6) = A(25, 5) < ee eae i att 
A(26, 6) = A(13, 6) A(13, 3) = A(12, 6) A(14, 4) 

> 24.A(7, 4)A(7, 2), 
and 


A(26, 6) > (24)(8)(64) =3-2”. 
This tells us that 


128 
12 4 ; 
3°2 A(26, 6) < 163 ‘oe 


Further, it tells us how to construct a code of 3.2” points 
for n = 26, d = 6, because all theorems of this paper 
bounding A(n, d) from below are constructive in nature. 
In order to construct such a code the inequalities leading 
to A(26, 6) > 3.2” are retraced. 

First let K, be the code of 8 points for n = 4, d = 2, 
consisting of all 4-digit points which have an even number 
of 1’s among their digits. From K, and the two point code 
1111,0000, there may be constructed by the method of 
Theorem 4 a code K, of (8) (2) = 16 points with n = 8, 
d = 4. If the sixteen points of K, are separated into two 
sets according to whether the last digit is 0 or 1, at least 
one of the sets will have eight or more points and deletion 
of the last digit will give a code K; of at least eight members 
with n = 7,d = 4. We have now got as far as A(7, 4) = 8 
in retracing the inequalities. Next we take K, as the code 
of 64 points consisting of all 7-digit points with an even 
number of 1’s among their digits. Ky exemplifies A(7, 2) = 
64. From K; and K, there may be constructed by the 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


method of Theorem 4 a code K; of A(7, 2)A(7, 4) = 
points, with n = 14 and d = 4. By merely suppressing 
the last digit of K; we get a code K, with the same number 
2° of points, n = 13, d = 3. 

Since 4:3 — 1 = 11 isa prime, the method of Theorem 
3 permits construction of a code K, exemplifying 
A(12, 6) = 24. By the possibly wasteful process of adding 
an 0 at the end of each point of K, there may be con- 
structed Ks, a code of 24 points with n = 13, d = 6. 
Finally from K, and Ks there may be obtained, again by 
the method of Theorem 4, our desired code for n = 26, 
d = 6, with at least 24-2° = 3-2” points. 


APPENDIX 


Proor Tuat A(4m, 2m) = 8m IF 4m — 118 A PRIME 


In this proof of congruences are modulo 4m — 1 if not 
otherwise noted. The first and greater part of the proof 
consists of constructing a set @;, 2, °-* , Gim—1 of (4m — 1)- 
digit points satisfying 
k Aj 


(a; |= 2m and \a;*a, | — 2m) 


and 


jks Ve Duce, Anal 


After that the rest is simple. 

It is a well-known theorem in elementary numbed 
theory that every odd prime p has a primitive root r: an 
integer such that each of 7, r’, r°, «++ , r?-* is congruent 
to a different one of 1, 2,3, --- ,p — 1. Letr bea primitive 
root of the prime 4m — 1. 

An integer « # 0 modulo p is called a quadratic residue 
of a prime p if there exists another integer y satisfying 
y’ = x modulo p. If there exists no y satisfying y° = x 
modulo p then x is called a quadratic nonresidue of 
p. It is known from elementary number theory that: 
exactly half of the integers 1, 2, --- , p — 1 are quadratic 
residues and half are quadratic nonresidues and that — 1 
is a quadratic nonresidue of all prines of the form 4m - | 
The numbers r’, r*, +++ , r*”~? are all quadr atic ae 
of 4m — 1, for clearly y = r* satisfies y? = r** for k = 


f 3 4m-3 

2, °** , 2m =. 1. Therefore, 7 us ss car ae ee Ms 

quadratic nonresidues of 4m — 1. The numbers — r? 
4 4m—2 . E a 

—r,s++, —r°"™ are also quadratic nonresidues, for if 


there were a w satisfying w’ = — r** 
y—namely, the y satisfying w = yr*—which satisfies 
y’ = — 1, and this is known to be impossible module 
4m — 1. The numbers 7, r°, ++: , oe are, therefore, each 
congruent to a different one of — 1’, — r*, --- , — ae i 
each set containing one member Soren to each of the 
nonresidues among 1, 2, 3, , 4m — 2. 

I shall construct the @y, Ga, *** , Gm—1. In terms of them 
binary digits. To that end, I first ieee a binary digit 2, 
for all integral 7 by: 


there would be a 


2; = 1 if 7 is a quadratic residue of 4m — 1; #.e., if 7 is 


congruent to one of r’, r*, r°, +++ , r4#"7?, 


? 


2; = O if 2 is a quadratic nonresidue of 4m — 1; i.e., if 
2 is congruent to one of r, 7°, «++ , r4"~*, 
m2; = lift#=0. 


y 

_ The z; so defined have the property, as is easily verified, 
that 2; = z,,. for every 7. It follows that 2; = 2:,. = Zin = 
Zire = +--+ etc., for every 7. Also, since r’, r*, r° 


Soisces 
ptm —2 3 4m—3 


r are congruent to-— 7, — r,:+--,—r in some 


order the above equations may be expressed 


Zi Cae, = Sofa ae = = -) ELC. Lor 


Pe 2. Sees = 3. = etc., forevery 7. 


These equations may all be combined into 


; k even 
f Sirk = 
: Rais, k odd 


for any 7 and k. 

The a; are now defined. Let a, be the (4m — 1)-digit 
point whose 7th digit is z;,,7 = 1, 2, --- , 4m — 1. For 
j = 2,3, --- , 4m — 1, a; is obtained by cyclic permu- 
tation of the digits of a,: let z;,;_, be the 7th digit of a;. 
Consider the digits of a,. The last one is 24,-1, which is 1 
because 4m — 1 = O. Of the others, half of the subscripts 
are residues and half are nonresidues; 7.e., half the digits 
are 1’s and half 0’s. Therefore, 2m of the digits are 1’s, 
and 2m — 1 0’s; |a,| = 2m. But since a; is obtained by 
permuting the digits of a,, we have 


La; | = 2m, 7=1,2,-:-,4m— 1. 
This is one of the two conditions we shall require on the 
a;, the other being 
(OA O,| = 2. it 94 40h. 
We now verify that the second condition is also met. 
Let 


Ge lta; dy |} Crfi pei rie, EN — Gls, 


[ wish to show s;, = 2m for all 7 ¥ k. By the cyclic con- 
struction of the a;, it is clear that |a; * a,| = ja, * a—j41| if 
k > j. It is, therefore, sufficient to prove s,,, = 2m, 
me 2,3, °-* , 4m — 13 OF Siar = 2m, u = 1, 2, ++, 
in — 2. 

S1u+1 = |Q1 * G,4,| is investigated directly by comparison 
of corresponding digits of a, and a,,,. These are, in order, 
BMeMRINS \21-Saeas 24) Suid} © 22-7 2em1, Zar S1junr 18. the 
number of pairs 2;, 2..; for which 2; ¥ 2,,;. It is con- 
venient to rearrange the pairs as follows (J utilize 2.,,-1 = 
pet = Pu, CbC.): 


Zo) Su; Guy S2uy °° * 5 uy Zo- 


This rearrangement will always be possible because the 
arst elements of the pairs are the same for both sets, 


Plotkin: Binary Codes with Specified Minimum Distance 


449 


0, u, 2u, --- , running through the values 1, 2, 3, --+ for 
u= 1, 2,--- ,4m — 2. 

If u is a quadratic residue of 4m — 1, then u = r** and 
we had seen that z;,2k = z;. We may express the pairs 
Cares Cup Cans Com eat edly AS Sos Seabee ean ea oe Omer 
Pe 27am, 0g OF fmMally as 25, 213-21, 27 en) Cap ee ee 
Zo. To summarize, s,,,+; 18s the number of pairs of adjacent 
elements 2;, 2;,, In a complete cycle 2, 2, 22, -** , 2-1, 20 
for which z; ¥ 2;4,, if wis a quadratic residue of 4m — 1. 

If w is a quadratic nonresidue of 4m — 1, then 
u = r** and we had seen that z;,..-. = 2_;. In this 
; 2-u, 2 may be ex- 


3r2k} 2 


case the Pais 2o,:215 2ay Coun Gaus Sean 
Pressed sy za. c,sen1: Zjaeok, (eppakors ee, art) Zapese ae 
Papman, 25 and tnally= Dy ey; Sarees Peano oss Coa ol ein eoe 
This time it is seen that s,,,., is the number of pairs of 
adjacent elements z;, 2;-, in a complete cycle 20, 2-1, 2-2, 

- , 2, 2 for which z; ¥ 2;-,, if uw is a quadratic non- 
residue of 4m — 1. 

But the two cycles, for w a residue and for u a non- 
residue, give the same value of s,,,,:, for one cycle is the 


other written backwards. It remains to find s;..41 = 8, 
the distance |a; * a,| for any distinct 7, k. Consider the first 
digit of a;,7 = 1,2, --- , 4m — 1. It will be z,;, which has 


been seen to take on the value 1 for 2m of the j and O for 
2m — 1 of the 7. Exactly 2m(2m — 1) of the 


(4m — 1)(4m — 2) 
2 


different a; * a,, 7 # k, will have 1 for the first digit. This 
is true also for the second, third, etc., digits by reason of 
the cyclic construction of the a;. The total number of 1’s 
in all the different a; * a, is therefore >);., |a; * @| = 
(4m — 1)2m(2m — 1). But this sum is also 


(4m — 1)(4m — 2) 
2 


because s was the distance between each pair and there are 


(4m — 1)(4m — 2) 


2 
pairs a;, a, 7 # k. Combining the two, 
8 = 2s! aps ap| far=4 ek: 


j,b =k, 2,°-+,4m—10 


Now that there have been constructed the a; with the 
two desired properties |a;| = 2m and |a,; * a,| = 2m if 
j ~ k, it is easy to construct a code to demonstrate 
A(4m, 2m) => 8m. 


6 This implies that there are 2m alternations—1 followed by 0 
or 0 followed by 1—in the sequence 22122 -+* 2-120. Since 2; = 20 = 1 
and z_1 = 0, it follows that for primes of form 4m — 1 the quad- 
ratic residues among 1, 2, -:: , 4m — 2 occur in exactly m blocks, 
as do the nonresidues. 


IRE TRANSACTIONS ON 


For j = 1, 2, --- , 4m — 1, let b; be the 4m-digit point 
obtained by adding an 0 at the end of a;. Because |a;| = 
2m and |a; * a,| = 2m for j = k it is clear that |b;| = 2m 
and |b; * b,| = 2m for 7 = k. I denote by e the 4m-digit 
point whose digits are all 1; by 0, as before, the 4m-digit 
point whose digits are all 0. 

I claim that the points e, 0, b;, e * b;form a code of 
8m 4m-digit points of minimum distance 2m, demonstrat- 
ing that A(4m, 2m) > 8m. Clearly there are 8m points 
in the code, each of 4m digits. Only the minimum distance 
requirement need be established. Since |b;| = 2m implies 
_ |e * b;| = 2m, the zeros of b; being the 1’s of e * b; and 
conversely, it is clear that e, o are each at distance 2m 
from each of the remaining points. Also, |b; * b,| = 2m 
for 7] ¥ k implies |(e * b;) * (e * b,)| = 2m for j # k, the two 
distances being equal. It remains only to check 
|b; * (e * b,)|. But this is equal to |e * (6; * b,)|, and is equal 
to 2m or 4m because |b; * b,| = 2m or 0 depending on 
whether 7  k orj = k. 

This completes the proof that the code as constructed 
exemplifies A(4m, 2m) > 8m. The construction of such 
a code for given m is quite simple in practice, compared 
to the proof above that the code constructed fulfills the 
requirements. The case m = 3 is illustrated. 

For m = 3, 4m — 1 = 11. It is readily determined that 
2 is a primitive root of 11, the numbers 2, 2”, 2’, eon 
being congruent modulo 11 to 2, 4, 8, 5, 10, 9, 7, 3, 6, 1, 
respectively. The second, fourth, etc., of these are the 
residues: 4, 5, 9, 3, 1; the others 2, 8, 10, 7, 6, the non- 


450 


On Decoding Linear Error-Correcting Codes—I* 


NEAL ZIERLERT 


Summary—A technique is described for finding simply computable 
numerical-valued functions of a received binary word whose value 
indicates where errors in transmission have occurred. Although 
it seems that a certain condition must usually be fulfilled for such 
functions to exist, or for our method to constitute an efficient 
procedure for finding them, there is, on the one hand, a strong 
tendency for ‘‘good’’ codes to satisfy the condition, while, on the 
other, it appears to be straightforward to construct codes which 
are good for a specified channel and also fulfill the condition. An 


* Received by the PGIT, November 12, 1959. Operated with 
support from the U. 8. Army, Navy and Air Force. 
+ Lincoln Lab., Mass. Inst. Tech., Lexington, Mass. 


INFORMATION THEORY 


September 


residues. The definition of z; requires that z; = 1 for 
= 1, 3, 4, 5, 9, and also for 2 = 11; 2; = Oforz = 3 
6, 7, 8, 10. This gives us the a;: 
aye 1 O10 OCOLE. OME 
ao: OL EOO0; C1002 
dao MAN 1503000 2200 { 
(etc., by cyclic permutation) 
dios OT ONS Sow or 
it VT Oe OMe 
The desired code of 8m = 24 points of 4m = 12 digits 


each;of minimum distance 2m = 6 is the following: | 


0: 000000 000000 e: 11111 11111 
by: 101110 001010 ¢*b,: 010001 110101 
be: 011100 010110  ¢*b,: 100011 101001 
bs: 111000 101100 e¢* bs: OOO111 010011 
bio: OL1011 100010 ¢* bio: 100100 011101 
bis: 110111 000100 ¢* by: 001000 111011. 


advantage of the resulting decoding procedure is that it corrects | 
and detects all possible errors; more precisely, if a word u is received | 
and the coset ii to which u belongs has a unique leader e, the pro- | 
cedure concludes that u + e was sent, while if u has no unique: 
leader, that fact, along with the weight of i (and sometimes a. 
little more) can be indicated. The ideas and techniques are illus-. 
trated by the construction of decoding procedures for the perfect | 
(23, 12) three-error-correcting code. 


I. INTRODUCTION 


‘teas type of decoding procedure with which this: 
paper is concerned may be briefly described as; 
follows (see also the Summary). Let V be the space: 
of binary n-tuples; let A, the code, be a k-dimensional | 


960 


ubspace and let Z be an isomorphism of the space of 
|-cosets in V on the space S of binary n-k-tuples. Let 
‘ denote a group of weight-preserving automorphisms 
f S (the weight of a coset is the smallest number of ones 
9 be found among its members and JL transfers this 
reight to S). Regard the members of S as rows and the 
1embers of I’ as matrices relative to the natural basis of 
. I’ decomposes S in two different ways into collections 
f subsets {0;} and {0/} as follows: s and ¢ are in the same 
:(0/) if, and only if, sw = ¢ (sa’ = t) for some a e I where 
’ denotes the transpose of a. We shall see that the number 
of sets in these two decompositions is the same and that 
a is small, 7.e., approximately n, then an effective 
ecoding procedure consists essentially in determining 
0 which 0; a certain member of S, readily obtained as a 
inction of the received n-tuple, belongs. Furthermore, 
nis determination may be made in the following way. 
eoy,,7 = 1, , « denote the numerical valued func- 
ion on S defined as 


n—k 
¥i(s) = 2 in Gee ease 

‘hen the set of numbers y,(s), --- , ¥.(s) are character- 
tic of the 0; to which s belongs. Indeed, a properly 
hosen small number of the y; will often suffice to dis- 
mguish the 0;, for there is a tendency for all of the y; 
0 be rational functions of a few—and hence, the few 
aust separate the 0;. 

Our results are immediate descendents of some work of 
range [5], [6], [8] and were inspired in part by an obser- 
ation of W. I. Wells. 
~The theory is developed in Sections II and III, while 
ections IV and V are devoted to an example—the con- 
truction of decoding procedures for the (23, 12) three- 
rror-correcting code of Golay-Paige—which is intended 
9 illustrate the ideas and techniques in sufficient detail 
) serve provisionally as a model for further applications. 
‘he computations involved in designing the decoding 
rocedure are sometimes more easily carried out in V 
han S. The details of the way in which this may be done, 
ith an example, will be treated in a second paper.’ 


Il. PRELIMINARIES 


Let K = {0, 1} with addition and multiplication 
Mmedby 0+ 0=1+1=0,0+1=14+0= 
= 1-0 = 0:0 = 0, 1-1 = 1. Let n be a positive 


iteger and let V, = V be the vector space of n-tuples of 
ements of K; that is, an element of V is a 1 Xn matrix 
= (u;, °** , Un) With coefficients (or “components’’) in 
.e field K. Thus, V is a commutative group under the 
omponentwise, in K) addition of its elements, multi- 
ication (componentwise) of members of V by elements 


1 Probably to be given at the Fourth London Symposium on 
formation Theory. 


Zrerler: On Decoding Linear Error-Correcting Codes—I 


451 


of K is defined, and every subgroup of the group V is 
automatically a subspace of the vector space V.? For u, 
ve V and B C V, we define the norm of u, ||u||, to be the 
number of nonzero components of u, the distance from 
u to v, d(u, v), is ||u + o|| and the distance from u to B, 
d(u, B), is minimum {wu + b: b e B}. Assume n > 1 and 
let k be a positive integer less than n. An (n, k) code is 
determined by the choice of a k-dimensional subspace A 
of V and a linear transformation 7 of V, on A. Two 
(n, k) codes with respective k-dimensional subspaces A 
and B of V are said to be equivalent if A is mapped on B 
by the automorphism (#4, --+ , Un) — [Upay, **° > Unc] 
induced by some permutation 7 — p(z) of the indexes 
1, --- ,n. It is easy to see ([7]) that any (n, k) code is 
equivalent to one with space A and linear transformation 
T of V, on A such that (v7); = v; for? = 1, , k; we 
assume, henceforward, that A, T are a fixed k-dimensional 
subspace of V and linear transformation of V; on A with 
this property. A coset of V modulo A is a subset of V of 
the form = {v + a: ae A} where v is some fixed element 
of V. Two cosets are either disjoint or identical, and the 
observation u + v + a = u+v shows that the set V/A 
of cosets is a vector space over K under the addition 
U uto y= 4 ut vy. Since each coset contains 2” members, 
V/A has 2”/2* 2” elements and so is isomorphic to 


Vi» For ue v, we define the weight of u, W(u) = the 
weight of t, W(a), =d(0, &) = minimum {||v||: v e« a} and 
a leader of % is an element v of @ with ||v|| = W(a). By a 


decoding procedure for a subset C of V (consisting of entire 
cosets) we shall mean a mapping D of C in V, such that, 
for ue C, Diu) T + wu depends only on the coset % to 
which u belongs and ||D(w) T + u|| = Wa) (see [7]) 
Clearly, D may be regarded as a function on part of 
V/A, and, once an isomorphism of V/A on V,,_;, has been 
singled out, as a function on part of V,,_;. 

Define the inner product w-v of two elements wu and v 
of V by 


where the operations are those of K; if w:v = 0, we say u 
is orthogonal to v, u Lv;if BC V, BY = {we V:u Lb 
for all b « B}. Clearly, A+ is a subspace of V of dimension 
n — k. Let L be ann X n — k matrix whose columns 
form a basis for A*. Then wL depends only on the coset 
a to which w belongs, and %@ — uwL is an isomorphism of 
V/A on S = V,,-,; henceforward, we identify V/A and 
S under this isomorphism. A decoding procedure D of 
interest to us here has at its core a single real-valued 
function ¢ on V, which we call the ‘‘decoding function” 


2 The chief use of the vector space viewpoint we shall make here 
is that endomorphisms of the groups involved may be represented 
by matrices with coefficients in A. For further ca of the standard 
algebraic notions which appear here, See, 6.9.5 A. A. Albert, ““Funda- 
mental Concepts of Higher Algebra,” University of Chicago Press, 
Chicago, Ill.; 1956. 


452 


of the procedure,’® and the computation of a value of ¢ at » 


a (suitably processed) received vector wu yields, by 
relatively trivial computation, D(u), or, perhaps, suc- 
cessively, the digits of D(w). A consequence of the follow- 
ing lemma is that W may serve as a decoding function. 


Let ¢,, --- , é,.be the natural basis for V: (e;); = 4::, 
OA BION a . 
Lemma 1:* Let ue V,1 <i <n. Then W(u + e;,) < 


Wu) if, and only if, some leader of @ has a one in the 7th 
place. Indeed, if a has a leader m with m-e; = 1 then 
m + e; is a leader of u + e;, and if W(u + e:) < W(u) 
and v is a leader of wu + e,;, thenv-e; = Oandv + ¢, isa 
leader of @. 

Proof: Suppose W(u + e;) < W(u) and let v be a leader 
of uw + e;. Then W(u + e;) = |lo|| < Wu) < |lo + e,|| 
since v + e, « &. Hence, ||v + e;|| = |lo|| + 1 = Ww) 
must hold, so v-e; = 0 and v + e,; is a leader of &. Con- 
versely, suppose @ has a leader m with m-e; = 1. Then, 
Wu + e) = Wim + &) < |lm + ell = ||m|| —1 = 
Wu) — 1, t.e., Wu + e;) < W(u). It follows then by 
what we have just proved that if v is any leader of u + e,, 
v has a 0 in the 7th place and v + e;, which has a one in 
the ith place, is a leader of @. Hence, ||m + e,|| = |lv|| 
and m + e; is a leader of w+ e;. 

Corollary 1: If & has the unique leader m and m-e; = 1, 
then m + e; is the unique leader of uw + e;. 

Corollary 2: Let C be the subset of V consisting of those 
n-tuples whose cosets have unique leaders. Let ue C, and 
define the sequence u”, u} --- , u* as follows: 


w=u, wets : if Ww + e411) > Wu’), 
Utena, if We t+ en:) < Wu’) 
2=0,-+:,k —2 


Then the function D from C to V* defined by D(u) = u* 
is a decoding procedure for C. 

Corollary 3: If some leader of %@ has a one in the 7th 
place, then every leader of wu + e; has a 0 in the 7th place. 

We have as a final corollary a theorem of E. H. Moore 
and D. Slepian. The proof is a variation of the one dis- 
covered by E. Prange. 

An n-tuple u is a descendent of an n-tuple v if v-e; = 1 
whenever w-e; = 1; wu is an immediate descendent of v if 
it is a descendent and ||u’ + v|| = 1. 

Define an ordering in V as follows: u > v if, and only if, 
u ~ v and if 7 is the smallest of the indexes j for which 
u; ~ v;, then u; = 1. 

Corollary 4: There exists a subset C of 2”-" elements of 
V consisting exactly of one leader from each coset of V 
modulo A such that every descendent of an element of C 
belongs to C. 

Proof: Let C be the set consisting of the largest leader 
of each coset (in the ordering of V defined above). It is 
clearly sufficient to verify that any immediate descendent 


* Since ¢ takes on only a finite set of values, it may, of course, 
be restricted to be integer valued. 
4 This is a slight refinement of a discovery of E. Prange. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septemb 


of an element of C belongs to C. Let u be a member of ¢ 
suppose ||w|| > 1 (for otherwise there is nothing to prov 
and suppose u-e; = 1. We must show that u + @; is tl 
largest leader of u + e;. Indeed, if v is any leader 
u + e,, then v + e; is a leader of & by the lemma 
v +e; < usince u e C. But v-e; = 0 by the lemma, : 


QW ae He 


Ill. Tue Computation or DecopING FUNCTIONS 


Let V, A, L, S be as in Section II. For each t e S defir 
the function X, on S by X,(s) = (— 1)*"’ where ¢’ denot 
the transpose of the 1 X n — k matrix ¢t. The produ 
X,X,, of two such functions is defined as a function ¢ 
S by X,X,(s) = X.(s) X,,(s). It is readily verified ths 
the set of all X,: 4 « S forms a group Y under this mult 
plication which is isomorphic to S under t > X;. Y 
called the character group of S and its elements are calle 
the characters of S. 

Let H be the set of all functions from S to the re 
numbers. H is evidently a real vector space of dimensic 
2”-" with the natural inner product f-g = Doses f(s)g( 
and corresponding norm: ||f|| = ~W/f-f. Of course, ea 
character is a member of H of norm 2°”, and it | 
well-known and easy to see that Y is an orthogonal bas 
for H. Thus, if ¢ is a decoding function, we may design) 
device for computing ¢(w): we V along the following line 
Assuming, as we do, that ¢ is constant on cosets, it mé 
be identified with the function f e H: f(ul) = ¢g(u). W 
compute, once and for all, 2” “ real coefficients c,, indexe 
by the members of S, as follows: 


gs DSI 


seS 


Cy = 


Then 
f(s) = > ¢.X,(s) ( 


teS 
provides a means of computing g(w), for we take s = wu, 
and each X,(s) is a product of certain of the numbe 
(— 1)*'. Of course, for the method to be practical, ma 
of the coefficients c, must be zero, and the nonzero on 
must take on only a small number of different values. 
will be seen, these things occur in a particularly Rei: 


way if there exists a group I’ of automorphisms of | 


such that 
f(s) = f@® if, and only if, t = sw for some ae I. ! 


Define an orbit of a given group T' of automorphisms of! 
to be a subset of S of the form sl’ = {sa: a e T} for son 
se S and define a level set of a function g on S to be ti 
complete subset of S on which g takes on a given valu 
Evidently, distinct orbits are disjoint. Condition (2) m) 
be restated as follows: the orbits of I coincide with t 
level sets of f. Suppose now that we begin the design of c 
decoding procedure by choosing a tentative decodi 
function g, a function on S, for which there exists a ne 
trivial group I of automorphisms of S under which gt 
invariant in the sense that g-a = g for all a e I [wh 


GO 


-a(s) = g(sa)], 2.e., g is constant on the orbits of T. We 
1all exhibit a natural “computable” orthogonal basis for 
1e functions on S constant on the orbits of I in terms of 
hich a decoding function f may be constructed which 
nds to achieve the desired reduction of (1). We suppose 
1e elements of IT to be n — k square matrices (7.e., we 
present the linear transformations of I by matrices 
lative to the natural basis of S) and let a’ denote the 
‘anspose of the matrix a. 


Lemma 2: ea BX: 
mrool: X, 


oa Ola Dalry 
- als) =_X,(sa) = (—1)**" 
“TES cee — i. oO 


FLet I’ = {a’: ae T}; clearly I” is a group of n X n 
atrices isomorphic to the group T under a’ —> al. 
et Of, ---, 0%, be the distinct orbits of S under I’ and 
t w; be the integer-valued function on S defined as 
sllows: 

y,(s) = pe AS). Pome tL tes Lo 
et Hy denote the set of all real-valued functions on S 
hich are constant on the orbits 0,, --- , 0, of S under I; 
early, Hy is a real vector space of dimension x. 
Theorem 1: The functions y,, --- , ¥,- form an orthogonal 
asis for Hy. In particular, x’ = z. 
Proof: Since Y is a set of pairwise orthogonal functions 
1 S, the linear spans of two disjoint subsets of Y are 
thogonal and, hence, the y; are pairwise orthogonal. If 
se SandaeTl, 


(sa) = J) Xia) = Do Xi a(6) 
= dX Xi) = 2 XO = v0, 


2., the w; belong to Hy. It follows that x’ < 2, and inter- 
anging the roles of I and I” (which we may do since 
’=T),«x < x’. Hence, x = x’ and the result follows. 
Of course, the significant thing about a decoding func- 
on is not the precise values it takes on, but rather the 
composition of S into level sets which it induces. By 
heorem 1 we can choose a set (in a suitable sense mini- 
al) of ~; which separates the level sets of a tentative 
coding function g (with g e Hy), and then some simple 
ear combination f of these y; will have the same level 
ts as g, or some still simpler linear combination of the 
may serve as a new decoding function. It should be noted 
at the desideratum for IT is not that it be large, but that 
be transitive, or nearly so, on each level set of g. That 
given s and ¢ in the same level set, we should like T' to 
ntain an a with sa = ¢; but a large I’ which thoroughly 
ixes the elements of small subsets of level sets might 
ll far short of transitivity. 

Most of the basic ideas will, hopefully, be clarified by 
e discussion of the example which follows. This section 
concluded with a few further remarks on the weight 


nction. 


Zierler: On Decoding Linear Error-Correcting Codes—I 


453 


An automorphism of a space V; is, of course, determined 
by the assignment of an image to each member of a basis; 
an automorphism which maps the natural basis on itself 
is called a coordinate permutation. In general, the statement 
that an automorphism M of V; leaves the subset C of 
V,; “invariant” or “maps it on itself’? means that cM ¢« C 
for every c e C (not necessarily that cM = c for every 
ceC, v., that C is pointwise invariant). 

In view of Lemma 1, the weight function W appears as 
a natural candidate for the decoding function. The follow- 
ing theorem gives a characterization of the full group of 
automorphisms of S which preserve W. 

Theorem 2: Suppose ||a|| > 2 for every a e A except 0. 
Let 7 be the group of coordinate permutations of V which 
leave A invariant and let T be the group of automorphisms 
of S which preserve W. An isomorphism P — M> of + 
on T' is established as follows: uwLM, = uPL. 

Proof: Let P ¢ + and let M = Mp. If ul = vL then 
utveAso(ut+tv)PeAand0= (ut+v) PL=uPL+ 
vPL, t.e., uPL = vPL, and it follows that M is well- 
defined as a function from S to S. If uw and v are arbitrary 
members of V, (ul + vL)M = (u+ v)LM = (u +0) 
PL = uPL + vPL = uLM + vLM and M is an endo- 
morphism of S. If uwLM = 0 then uPL = 0s0 uP ¢ A; 
hence, ue A, soul = 0, 2.e., M is an automorphism of S. 
Since P obviously preserves weight, we have W(uLM) = 
Weel) =" WaP)i= Wa) andiM <P 1 PP, Over: 
ULM 26 = UP OL = (UP )OlLe= UP LMG = Oe ho 
uLM>,M., and P — Mp is a homomorphism of 7 in T. 
li Mp = 1, then uPL = ulLM,p = ulso uP + uv 2A for 
all wu « V. In particular, d; = e;P + e; e A, and since 
\|d;|| = 0 or 2, e,P = e; must hold by hypothesis and 
hence P = 1, 2.e., P — Mp> is an isomorphism of x in IT. 
It remains to show that the mapping is onto [T. To do 
this, we choose M e T arbitrarily and construct P e r 
such that M = Mp. Indeed, S, = {e,L}%_, are the 
elements of S of weight 1, and since S, is invariant under 
M by assumption, there exists a mapping 6 of {1, --- , n} 
in itself such that e,LM = e9:,L. If 6(¢) = 0%) then 
0 = Cab + Cal = e LM + ¢,LM = (e; + e)LM = 
(e; + e;)L since M is nonsingular so e; + e; e A and 
hence 7 = j, since A contains no elements of norm 2. 
Thus, 6 is a permutation and we let P denote the co- 
ordinate permutation with e;P = @9:). If we V, uPL = 
> \u,e;PL a So eee, = dsuse:LM = uLM; hence, 
if we A, wl = O implies uwPL = 0 implies uP e A so Per 
and the same formula shows that M = Mp, thereby 
completing the proof of Theorem 2. 

Corollary 5: Suppose ||a|| > 2 for a 4 0e A and let MW 
be an automorphism of S. Then J preserves weights if, 
and only if, S,, the set of elements of S of weight 1, is 
invariant under M/. 

Corollery 6: Suppose ||a|| > 2 for a 4 Oe A and let M 
be an automorphism of S. Then M preserves weights if, 
and only if, it permutes the rows of L. 

Corollary 7: With the assumption and notation of the 
statement, let a be any automorphism of V leaving A 
invariant and preserving W. Then the permutation of 


454 


cosets induced by a@ coincides with that induced by some 
member of 7. 


IV. DecopING PROCEDURES FOR THE (28, 12) 
THREE-HRROR-CORRECTING Cop 
oF GoLAy-PAIGE’ 


For this code, n = 23, k = 12, and, following [4], we 
choose L = 


(ay Sy SS heey fee IS eS een dS i ES eS) a) (Sa) SS) SaaS) 1 
SES eo 8} 8 ea) eek fo (ef i eae ee 1S) eae) =) J =) 
pee) eS ae) ee Sh SS) Sy See oe St = 
eS) 1 eS a Sei tS SS Orne ae oo es S=) 
Pe (Sy 8 IS SSS I) ee) Sea SIN Sey eS) =) 
SIS SS SOS ISS Se SS) aS Sa) SS Sy SS) 
a (SS) es) Se Se Sey SS) SS SSeS => 
SS SS = SS SS OS SOS oS Soo SS) Sa 
ee Se SS i eS ee ea) ea Sea ee SS SSS) 
Sree SS eS SS aes SSS SSS SS SS] 
a tg eC OG oe) SECs uC 


Observe that A, which is, of course, uniquely de- 
termined by L, has ae property that for each v « V;, there 
is (a necessarily unique) a e A with v, = a;, +--+, ¥, = GA, 
and it is natural to take for 7 this OME tr The 
key feature of the code is that every coset has a unique 
leader of norm < 3, and every element of V of norm < 3 
is a coset leader. 

Consider the following two permutations of the natural 
basis of S: 


a= ha, 2) (3, 5, 
y = (1, 2,3) (4,7 


6) (8, 10,9) (7). 0), 
5 88 Our 


The notation is as follows: a is the permutation which 
leaves e7 and e, fixed, maps ey > e, > €, > &, etc. If 
a*, y* are the corresponding induced automorphisms of 
S,a— a* and y — y* establishes an isomorphism of the 
group I’ of permutations of the set [1, , 11} generated 
by a and y on the group I* of automorphisms of S 
generated by a* and y*. It is easily checked that I* 
permutes the rows of Land so, by Corollary 6 of Theorem 


5 For an equivalent cyclic code with decoding procedure, see [5, 8]. 
6 See Section 4 of [4]. 


IRE TRANSACTIONS ON INFORMATION THEORY 
2 (which applies since every a 0 e A has norm = 7 


Septemb 


W is invariant under I'*. More specifically, if we numbe 
the rows of Lr; — 723 (and regard them as membe 
of S), the following are invariants of I'*: 


1) (1, ‘ 
2) iTiat, 
BM at Ten ene 
4) ||s|| for all se S. 


stays 


Indeed 1), 2) and 4) are immediate since I* consists | 
coordinate permutations and 3) is readily checked for « 
and y* (and, of course, accounts for their choice). Let 1 
classify the coset leaders or “errors” into ‘types’ wit 
the following notation: a leader wu is of type (2, 7, m 
where each 7, j, n is a nonnegative integer, 0 <7 + j + 
< 3, if wu has 7 ones in its first 11 places, ws. = 7, and : 
ones in the last 11 places. Thus, 1) to 3) may be pars 
phrased as follows: coset leader type is invariant und 
I*. The foregoing suggest that we take for decoding fun 
tion the refinement ¢ of W: o(u) = (2, 7, m) where wu is 
vector whose coset leader is of type (7, j, m) [in th 
notation, W(w) = « + 7 + mj. Of course, L provides. 
natural 1 — 1 mapping of coset leaders onto S, so thi 
we may speak of the ‘‘type’’ of s e S; let f be the functi 
on S: f(uL) = (||uL||,-2, 7, m) where g(u) = (2, 3, m) 
The following theorem will be established in the ne 
section. 

Theorem 3: The orbits of I* are distinct level sets of 
with the following exception: the set on which f takes ¢ 
the value (8, 3, 0, 0) consists of two orbits of I*. 

The function f appears to be a little finer the 
necessary—for the most part we don’t eare about ||wL||- 
but the idea is that ['* provides a convenient method 
computing f, from which W is trivially obtained, ar 
the additional information with regard to error type mé 
be used to make the decoder more efficient. It is co) 
venient to introduce what amounts to another notatic 
for the members of S. Let S* be the set of subsets | 
I = {1, --- , 11} and for se S let s* « S*: 7 ¢ s* if, ay 
only if, s; = 1. The elements of T act as permutations | 
S* in the natural way: e.g., if s* = {1, 2, 7} then s*a 
(4, 1, 7} (a.e., a transforms each element of s*). Clearl 
for 8e T and se S, (s6*)* = s*B; 7.e., roughly speakimi 
replacing S by S* replaces [* by T 

Note that for 6* e I'*, 8*’= 6*"*, since 6* is a coordina 
permutation and, hence, [*’ = I* (in other words, taki) 
transposes in I* corresponds to taking inverses in VJ 
It will be shown in Section V that I is doubly transitiv 
hence, the elements of S of norm 2 fall into a single orb 
as do those of norm 1. Let the function y correspondi 
to the latter orbit be denoted y,; thts, ¥,(s) = 11 — 2\' 
[and the y function corresponding to the former orbit: 


Ce COLE 


7 fis a refinement of ¢: L; of course, it is not integer or real-valij 
but it could easily be made so; it seems more natural to treat it 
its present form. 


1960 


There are altogether 24 orbits,*° which occur naturally 
in pairs, for if @ is an orbit in S*, so is 9 = {s*: s* e 6} 
where s* = {te 1:74 ¢s*}. Clearly, ||s*|| = 11 — ||s*||. As 
already noted, the elements of norm 2 fall into a single 
orbit, as do those of norm 1 (and, of course, 02 S = ¢ e S* 
is an orbit by itself). There are two orbits of norm 3, three 
of norm 4 and four of norm 5. The orbits of norms 0, 1, 2 
are named 0, 1, 2, respectively; orbits of norm 3-5 are 
named in the form 7-j where 7 is the norm and j is an 
index = 1, --- (“numerical order’’ by “‘smallest’’? member). 
An orbit of norm > 5 is the complement of some orbit of 
norm < 5 and is given its name primed. 


TABLE I 


Orpits 3.1 AND 5.3 

5.3 5.3 
ee 3 Ae De a oe 
—————— oR OS 
Oger 5 12 Leen 3 Seer eL On ees 5 
rg ei Caan 5 1 (bere wut eat 
4 6 he ee 9 y Mee eigee  ee 6 
8 Oe cawiie 23 6 1 ae ae 9 
4 ‘sip A Naina 7 2 fees aie oe tO) 
iE: 9 11 ieebree 5 6 EA oS ee. 9 
Me IOC kt 6 2 Rte go ueay tn, et: 
a 3 1:44 8 Bde 18 8 9 
3 6 hae a 10 iy cee ede, Sin ee LO 
2 5 ND aia 5 Lees 10s atl 
6 11 1 7 8 2 Siny eoumees: 7 
2 9 ee 27 10 2 Ba Acie eG 9 
ee AG) nS 9 1 ca: amar 9 
3 74 iP eee) ig 3 bie ate 7 
4 oa an a eas 8) ive 6 Giese iat tal” 
7 Sor Rat REE: 4 5 Vic ekc ete 2 amet 
5 Gm 28 8 Se) ee O. td 
So Fe Ls 6 1 7 ean, 8 
1 2 yey apa |) 3 Nem sae aay 8 
9 i) ln aes 6 1 10.5 Poe er eel 
3 Ae ty) a a alt eda bt ero LG) 
1 2 Soe NG 7 4 as tena Grek 20) 
ee 2 Oe 8 6 RUS AE Bee: Wee gan 
5 Ge 7 9 5 Bi 6 eo tO 
1 Oreste 8: Tat 4 Oo Sais mete aan eet 
3 UN Se are) 10 2 (ema ‘epee 9 
6 ee eNO ree AL 3 10" Ree SES AE 
1 Ae eke 0 10 


Table I lists simultaneously an orbit of norm 3 and one 
of norm 5 (together with the trivial one of norm 2), both 
of which are of length 55. 

Orbit 5.4 is of length 11 and its members are the rows 
in the following array: 


Wipe en a 
Le iG ee SF 10 
deecpevee 220! AG 
Pega or Poor od 
1a Uh Ageia) Alaa) 
2 pee 10 
Dero rss ikl 
yew Saat) eno > age 
SAO ge 8 
A Oe 10s Tt 
Sa 9F LOOP 11, 


8 See the following section for proofs and further details. 


Zierler: On Decoding Linear Error-Correcting Codes—I 


455 


Let ¥. and y; correspond to orbits 3.1 and 5.4, re- 
spectively. Thus, letting 


Si = 1 — 28;, als) = sisess + siszss + sysisg 
+ +++ = S§s5Sio 
and 
W3(S) = 81828487811 + + +> + 8783898t0S11; 


note, however, that combinatorial properties of the orbits 
lead to potentially much more efficient means of com- 
puting the y’s—see Section V. 


TABLE II 
Repre- Error Orbit 
Orbit | sentative Type V1 Yr v3 Length 
0 ¢ 0,0,0 11 +55 +11 1 
1 1 1,0,0 9 +25 +1 it 
2 1,2 2,0,0 vf +7 —1 55 
3.1 1,2,3 3,0,0 5 =f +5 55 
3.2 1,2,4 3,0,0 5 —]1 —3 110 
a | 1234 2,0,1 3 = +3 
4.2 1,2,3,10 0,0,3 3 =I —5 
4.3 2A. Theta 3 seb +3 
5.1 1,2,3,4,5 1,0,1 1 —15 +1 
5.2 1,2,3,4,7 1,0,2 1 +2 +1 
5.3 1,2,3,4,8 0,1,2 iL —7 —7 55 
5.4 1,2,4,7,11 0,1,1 1 +25 +9 11 
5.4! (iyi alpen! =25 220 11 
5.3" 0,0,2 = il +7 +7 55 
5.2! 2,0,1 —=1 —2 —1 
5.1 el eal +15 Lai 
4,3/ 1,0,1 — =i =3 
4.2’ 1,0,2 =3 +1 +5 
4.1’ 1,0,2 = 3) “9 =3 
3.2 2,0,1 —5 Stoll +3 110 
Bil! OO ae ao +7 =5 55 
2’ 2,1,0 —7 —7 +1 55 
le 0 —9 —25 —] 11 
0! OME OR al iees lel —55 —11 1 


Table II lists a representative of each orbit in S* of 
norm < 5 (its ‘numerically smallest’? member) together 
with the values of @ (the error = coset leader type), 
v1, ¥2 and ys, and a few orbit lengths. Of course, in general, 
¥(s*) = = W(s"). 

The level sets of the function g = (4, ¥2) [the value 
g(s) of a vector s is the ordered pair of numbers y¥,(s), 


456 


y.(s)] are exactly the orbits of I’. It is a slight refinement 
of f {it distinguishes the two orbits of type (3, 0, 0)] and 
appears to be an acceptable candidate for the role of 
decoding function. 

The pair h = (yy, ¥3) does not ue separate the level 
sets of f (nor those of W) but in view of the relative 
simplicity of the function y, it seems worthwhile to sketch 
a decoding procedure in which h is the decoding function. 
The motivating notion here is that if we find a function 
that falls a little short of accomplishing the desired 
separation of level sets (but is otherwise advantageous, 
e.g., simple to compute) then it may be possible, in effect, 
to use it as a decoding function by bringing into play 
some further properties of the code. 

Let d,, ++ , d23 and e, «++ , e, be the natural bases in 
V and S, respectively. We assume that some v ¢ V,,. has 
been chosen as message, that v7’ « A has been sent and 
ue V has been received with ||v7’ + u|| < 3. We shall 
have occasion to consider, along with ul = s, 


(u + dL = uL + d,L = $§ + C55 a il. Py IDs 
Let 
yy? — Vv; (s == €:); h; my ivan eae j = I, [a et 1M 


The decoding operation might proceed along the follow- 
ing lines. 

Compute y,(s). 

1) Tks) = Th, o, = 4s, t = 1,7", 12. 

2) If ¥,(s) = 9, 7 or 5 there are 1, 2 or 3 errors re- 
spectively among the first 11 w; (and none elsewhere). 
Compute y;"’(s). If < yA(s), 0; = us; if > vi(s) (in 
fact, = ¥,(s) + 2), »; = u; + 1. Terminate by 
finding the right number of errors or by replacing s 
by s + e; if 7 is smallest integer for which y{"(s) > 
¥,(s), etc., and terminate when y, = 11 (see Corollary 
2 of Lemma 1). 


3) If v,(s) a —- Wal, (Nie = Ui; t = il, Se hed iil, 
Vio = Ue + 1. 
4) If ¥,(s) = — 7 or — 9 there is an error in w,,. and 


2 of 1, respectively, in the first 11 places. Comple- 
ment s, set v;2 = Uy. + 1 and go to 2 for vj, --- , v1. 
If ¥i(s) ¢ {11, 9, 7, 5, —7, — 9, —H}, compute 
¥3(s), thus obtaining h(s) = [y¥,(s), ws3(s)]. 


OPEL TS)yirw (3, 7.9) (= 1, 9) OB, 5), 
there are no errors in first 12 places, 2.e., »; = ui, 
Cae | Weekes 0 

6) Ifh(s) = (1, — 7) or 1, 9), 0; = us o= 1, «s , 11, 
Vio = Ug + 1. 

7) h(s) = (8, 3) or (— 1, —1). As a glance at Table II 


will show, this is the first ambiguity, for the error 
in u may be (in either case) of either type (2, 0, 1) or 
(1, 1, 1). Here, however, we can make use of further 
properties of A as follows. Start computing h;(s), 
4 = I, 2; , ll. We obtain eventually the value 
(1, 9) if and only cf y(u) = (1, 1, 1). In this case, if 7 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


‘ 


is the place where the value was obtained, »; = w; 
for 12 ¥ j Xi, v9; = u; + 1 forj = 12, ¢. If C1, 9) 
is never obtained, (uw) = (2, 0, 1) and we proceed as 
follows [of course, this could be done along with the 
search for (1, 9)]. Let 7 be the first place in which 


hs) = (1, 1) or (— 8, —3). Then compute h; 
(s +e), 747 = 1, 2-411) Hp tor’some: jim 
(s + e;) = (— 1, — 9), then the two errors in the 


first 11 places of wu are in places 7 and j. If h;(s + e:) = 

(— 1, —9) holds for no 7, take for 7 and second value 
for which h,;(s) = (1, 1) or (— 3, — 3) and repeat. 

8) h(s) = (1, 1) or (— 3, — 3). Then for some 2 = 

,11,h,;(s) = (—1, —9) respectively (—1, 7) if and 

only if e(u) = (1, 0, 1) respectively (1, 0, 2), and 


for this 7,0; = u;,j #4,0; =u +1. 
9) h(s) = (— 8, 5). Choose 7: h,(s) = (— 1, 7). Them 
Ue + L Wi = (lye, a Fo 4 = f ; Wy 


10) h(s) = (— 5, 3). Then g(u) = (2, 0, 1) and we 
search for 7 with h;(s) = (— 38, —3); then compute 
hj(s +4a~7 ~ 7 = 1, , ll. If hj(s + e:) =| 
(— 1, —9) for some j, the two errors in the first 
eleven places of wu are in the 2th and jth places. If, 
however, h; (s + e,) = (— 1, — 9)-holds for no 7, 
take for 7 the second value for which h,(s) = 
(— 3, —3) and repeat. 


V Dara, BTC, 


This section gathers together proofs and details omitted | 


from Section LV. 


The Group IT. 


lr is a group of permutations of J = {1, --- , 11} with 
two generators a and y; we shall show that it is a doubly: 
transitive group of order 660. First, we list some useful! 
elements of I’. 


a = (1, 4, 2) (8, 5, 6) (8, 10, 9), 

Y = (1, 2, 3) (4, 7, 5, 8, 6, 9) (10, 11), 
. = (1, 3, 2) G, 5, 6) (7, 8, 9), 

aya = (1,5, 4) (2, 7, 6, 10, 3, 8) (9, 11), 
aya” = (2, 6, 4) (1, 3, 5) (7, 9, 10), 

ay ~ = (1,7, 5,9, 667811) 104) 


(6 and 6 are defined below) 


58°58" 7 (as 2, 5) (6, 10, 3, as a 9) Ge 8), 
7 =. Ch, 2,3) (4 G25) M70), 
ya = (2, 5, 10, 11, 9) ©, 4; 7, 6, 8), 
a) = ay aya” 
= (2, 5) Bus) 7 aS Geiene 1 fixec 
B = (a°ya) (ary* 
= (2, 6,.8).(8, 45 9) (Gy 11 er): 
6 = yay aya | 
= (4.3) (53 9)(6, <0) O ete): 
8 = (ya)’a°ye (ary)* (ya)? 
= (4, 7, 11) (6, 8, 10) (8, 9, 5), 1 and 2 fixec 
66 =" (3,9) (4, 10) (6 LL 8): | 
68° = (3, 5) (4, 6) (8, 11) (7, 10). 


1960 


Rows 13-23 of L, regarded as members of S* and labelled 
a = k (in a different order) are: 


a= fe 2x3) a 5, 6}, 

oe ae 2, 3,.4, 8, 9}, 

Baan ly 2,55,. 9; 10511}, 
a= (1,3, 6;.7, 10, 11}, 
Ca ae £1954; 8, 10}, 
f= {1, = 6, 8, 9, iy; 
Gr=—(2,.3, 48,10, 11}, 
Ri {2,4,-6, 7, 9, 10}, 
pe (2,5, 6,7, 8, 11}, 
4 = {3, 4, 9, Ty 9, iy 
k = {8, 5, 6, 8, 9, 10}. 


‘Let T,. be the six-element subgroup of I consisting of 
6, B, 8’, 68, 66 and the identity; evidently, the numbers 1 
and 2 are both fixed by all the members of T',,. We shall 
show that, in fact, I’,, is the full subgroup of I fixing both 
land 2. Indeed, suppose 7 ¢ I fixes both 1 and 2. Observe 
first that I is doubly transitive, for 1 is mapped into 
2, 3, --- , 11, respectively, by a’, y’, a, ay’, aya, ay, 
ay’, ay’, aya, ay ay, respectively, proving that IT is 
transitive. In the subgroup fixing 1, 2 is mapped on 3, 
4,--- , 11, respectively, by (ye)"B,, (ya)"Bi, m, B1, %6i, 
Bi, (ya)*, (ya)”, (ya)*, respectively, proving that IT’ is 
doubly transitive. Note, however, that I is not triply 
transitive, for if it were, there would be an element ¢ 
fixing 1 and 2 and mapping 3 on 4. But then ae = a must 
hold, since {1, 2, 3} € a implies {1, 2, 4} Cae, anda 
alone contains 1, 2 and 4. But similarly, b « = a must 
hold, and this contradiction shows that T is not triply 
transitive. Returning now to the arbitrary element 7 of 
Lr fixing 1 and 2, if 3y ¢ {3, 5, 9}, a glance at T,,. shows 
that the subgroup of I fixing 1 and 2 is transitive, hence 
that IT is triply transitive, which is impossible. Hence, 
3n « {3, 5, 9}, so the product of 7 and some element of 
T,, fixes 3; if we show that this product is in Ts, it will 
follow that 7 is in T,,. In other words, it is sufficient to 
show that 37 = 3 implies 7 = 6 or the identity. Now since 
n fixes 1, 2 and 3, it has the following invariants: {a, b}, 
fe, ft, {7, &}, {ce}, {d} and {g}. Suppose an = b. Then 
{4, 5, 6} — {7, 8, 9} — {4, 5, 6} and gy = g implies 
4yn = 8, 8n = 4. Similarly, dy = d, cy = c, respectively, 
imply that 7 contains the cycles (6, 7) and (5, 9) re- 
spectively. The foregoing implies en = f which in turn 
implies that 107 = 11, which completes the proof that 
n = 6 on the assumption that an = b. The remaining 
possibility is that 7 fixes a and b, as well as c, d and g. 
Hence, the following are invariants of 7: {4, 5, 6}, {7, 8, 9}, 
m9, 10, 11}, {6, 7, 10, 11} and {4,°8, 10, 11}; com- 
parison of these shows at once that 7 fixes, in addition to 
1, 2 and 3, the numbers 4, --- , 9. It follows that en = e 
must hold, and hence, since 107 = 10, that » is the identity. 


Zierler: On Decoding Linear Error-Correcting Codes—I 457 


Thus, T',, is the full subgroup of I fixing 1 and 2 we 
have’ the following result. 

Theorem 4: T is doubly transitive and its order is 
11-10-6. Furthermore, any maximal subgroup of IT 
fixing two numbers is isomorphic to the symmetric group 
of degree 3. 


The Orbits of T 


Since y,, ¥2 separate the representatives of orbits listed 
in Table II, the representatives do indeed lie in distinct 
orbits. We shall show that there are no others. It is 
sufficient, of course, to consider orbits of norm < 5, and 
those of norm < 2 are disposed of at once by double 
transitivity. Now every orbit of norm > 2 contains 
members containing 1 and 2 by double transitivity. We 
choose as representative for each orbit its numerically 
smallest member’? which will then automatically begin 
1, 2, --- . Let us begin by considering orbits of norm 4 
and suppose that {1, 2, x, y} is the representative of an 
orbit. If « ¥ 3, then x, y ¥ 5, 9, for otherwise we could 
apply a member of I,, to replace x or y by 3, thereby 
obtaining a numerically smaller member of the same 
orbit. Thus, {z, y} € {4, 6, 7, 8, 10, 11} which implies 
eo = Ay e{6,(, 8,10, 1); Then y+ 6) fora 1.2) 456) 
a = {1, 2,4, 9}. Thus, y = 7 or 11, and since {1, 2, 4, 7} 
68° = {1, 2,4, 11}, y = 7 must hold. We have proved 
that there is only one orbit of norm 4 that does not have 
representative of the form {1, 2, 3, y} and that its repre- 
sentative is {1, 2, 4, 7}. Now consider representatives of 
the form {1, 2, 3, y}. A glance at y shows that y = 4 or 
y = 10 must hold, and these do, indeed, yield distinct 
orbits. 

Consider now a representative of an orbit of norm 5 of 
the form 1, 2, 3, 4, x. If « + 5 then x + 6 for {1, 2, 8, 
4,6} a = {1,2,3, 4, 5}. Further, }1, 2,3, 4, 11}a0g’y? = 
(ie 2.3 407), (129.4 Oly = (hone ae een eae 
3, 4, 10} Bay’a’y*6B'y 6B ay ay = {1, 2, 3, 4, 7}. It 
follows that if a representative is of the form {1, 2, 3, 4, x} 
then x = 5, 7 or 8 (and all of these occur). Now consider 
a representative {1, 2, 3, x, y} with 4 < x < y;a glance 
at y shows that the only possibility for and y is x = 10, 
Y= 1 ut (123, LOe TT beta, lee 2) sean ten te 
remains to show that there is just one orbit with repre- 
sentative {1, 2, x, y, z} with 3 < x. There is indeed one, 
for a, b, --: , 7, & is an orbit, and so is its complement 
which is orbit 5.4 with representative {1, 2, 4, 7, 11}. 
Now, assuming 38 <2 <y <a, {z, y, 2} —O {5, 9} = 4, 
for otherwise we could apply a member of I,., which is 
transitive in {3, 5, 9}, to obtain a member of the orbit of 
thesforme {1s 2,-3.9 p24. hisa ye, Uae} a Soa Ores. 


9 See Burnside [1], Section 137, Theorem III. 

10 TetH = {ni, 2° , ns}, F = {m1 °° , m2} be subsets of 
I, each with s members, and suppose ni <_ -° < ns and 
m, <<++* < mg. “H is numerically smaller than F’”’ if when ¢ is the 
smallest index such that n; 4 mi, thenn: < mt. 


458 


10, 11}, and since T,, is transitive in this set, we must 
have x = 4 and {y, z} € {6, 7, 8, 10, 11}. Now 8 and 10 
must be excluded, for, e.g., {1, 2, 4, 8, 2} a = {1, 2, 4, 9, 
za}, {1, 2, 4, 9, za} 66 = {1, 2, 3, 10, 2268}, contrary to 
the hypothesis that {1, 2, 4, y, z} is a representative. 
Mhuse {yah GG) TY ey = Bak 2, Ae Geer 
{1, 2, 3, 4, za}, contrary to hypothesis, soy = 7, 2 = 11 
must hold, and this completes the proof that there are 
just four orbits of norm 5 with the representatives listed 
in Table IT. 

Finally, consider orbits of norm 3. Since T,, fixes 1 and 
2 and is transitive in 3, 5 and 9, {1, 2, x}: « = 3, 5, 9 are 
in a single orbit and, similarly, {1, 2, y}: y = 4, 6, 7, 8, 
10, 11 are in a single orbit, and this proves that there are 
(at most) two orbits of norm 3. 

The actual listing of the elements of an orbit, e.g., orbit 
3.1 in Section IV, is, of course, a slightly more tedious 
matter. It was made relatively simple by observing that 
{3, 5, 9} is in the same orbit as {1, 2, 3}, that {3, 5, 9} is 
invariant under [T,., and that if A is a subset of I con- 
taining 110 members which map the ordered pair (1, 2) 
on all 110 ordered pairs (7, j), 7 ¥ j, then every member 
of T is expressible in the form yA where n 2 T,. and de A. 
Then 13, 5, 9} as {3, 5, 9} (Tie A) = ({3, 5, 9} T',2) 
A = {3, 5, 9} A, so we had only to compute the 110 
triples {8, 5, 9} A: A e A. A is readily constructed by 
choosing a set A, of 10 members of I fixing 1 and mapping 
2 on 2,3, ---, 10 andaset A, of 11 members of T mapping 
lonl, ---, 11. Then it is not difficult to see that A = A, A, 
has the required properties; of course, one computes 
{3, 5, 9} A, and then applies the members of A, to these, 
bypassing the tedious and unnecessary construction of A. 


Computing p’s 


A block design™ D is an ordered pair of sets A, B with 
the following properties: A contains a finite number vy of 
objects or ‘‘points” a;, and B consists of b subsets S; of A 
called blocks. There are positive integral parameters 
m, Tr, \, in addition to v and b, with the following signifi- 
cance: 


each S; has exactly m elements; 
each a; occurs in exactly r blocks; and 
each pair {a;, a;} occurs in exactly \ blocks. 


It is not difficult to show that bm = vr and r(m — 1) = 
A(v — 1). If b = »v (and m = r), Dis said to be a symmetric 
design and in this case we have the following. 

Lemma 3:"” In a symmetric design, two distinct sets 
have exactly \ objects in common. 

Let 0 be an orbit of I’ in S* and let m be its norm. It 
follows at once from double transitivity that J, 0 is a 
block design [with parameters » = 11, b = orbit length, 


1 See Hall [3], p. 59, et. seq. 
2 See [3], p. 61, Theorem 1.1. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


\ 


m = norm of 0, r = bm/v, X = bm (m — 1)/v — 1)) 
For example, for orbit 3.1, the parameters are b = 55 
m = 3,r = 15, = 3, and we use these facts to compute 
e.g., ¥2 (the sum of orbit 3.1 in the character group) bj 
relatively painless nonarithmetical means as follows 
y, (orbit 1): Take any member of orbit 1, e.g., {1}. I 
appears in orbit 3.1 r = 15 times so ¥2 ({1}) = Ye (orbi 
1) = — 15 + (65 — 15) = + 25. 

y. on orbit 2 = yw. ({1, 2}) is computed as follows. | 
and 2 each appear r = 15 times in orbit 3.1, A = 3 01 
them together. Separate appearance gives — 1, simul: 
taneous or nonappearance gives + 1, so ¥ ({1, 2}) = 
— 24+ 3 + (55-27) = + 7. 

y. on orbit 3.1 = y¥. ({1, 2, 3}). 1, 2, 3 each appear 
r = 15 times, 3-\ = 9 times in pairs, once in a triple 
orbit 3.1. The triple and solo appearances give — l’s 
the double and nonappearances give + 1’s. The triple 
accounts for 3 doubles, leaving 6 doubles and 30 singles 
so v2 ({1, 2, 3}) = — 1+ 6 —30 + (55-37) = — 7. 
and so forth. 

Remark: Since the 55 elements of S of type (0, 0, 2) fall 
into a single orbit (see Table II), we already know that 
the sum of any two distinct members of rows 13-23 of 
L has norm 6. The combinatorial origin of this fact becomes 
clear in the following argument. First observe that ait 
5.4, the complement of the orbit formed by rows 13-23 0 
L, is a symmetric block design with parameter ) = 2) 
hence, by Lemma 3, each pair of its members has twe 
numbers in common. Now let s and ¢ be distinct members 
of rows 13-23 of L. Then (s + £)* = (s* U ) O (s* 1 
= (s* 1 t*) CO (s* U t*) = the numbers in s* U ¢ whiel 
are not common to both; since s* and. ¢* belong to orbit 
5.4, each contains three numbers not in the other and sé 
(s + t)* has 6 members, ||s + ¢|| = 6. 


The (¥1, v3) Decoding Procedure 


It is convenient to generalize the notation used fo! 
error type as follows: let (z, 7, m), where 7 and m are non: 
negative integers and j7 = 0 or 1, denote an element of & 
which is obtained as the sum of 7 distinct members o 
rows 1-11 of ZL, j times row 12 and m distinct members ol 
rows 13-23 of L. This notation may be used to facilitate 
arguments which depend on the fact that every nonzer¢ 
element of S is uniquely expressible as an (7, 7, m) witl 
1<¢t+ 73+ m°< 3, Thus; @g., instep: 3) aire) 
(2, 0, 1), we cannot obtain g(s + e;) = (1,9), for, if w hal 
an error in the 7th place, s + e; is a (1, 0, 1) and h(s) 
(1, 1) or (— 8, — 3), while if w does not have an error ii 
the 7th place, s + e; is a (3, 0, 1); then if A(s + e,) 
(1, 9), s + e; is also a (0, 1, 1), so we should have, sym 
bolically, (8, 0, 1) = (0, 1, 1), which is equivalent ti 
(3, 0, 0) = ©, 1, 2), or (3, 0, 0) = (, 1, 0), which a 
impossible. A slightly less trivial situation is encountere: 
in step 8, 7.¢., h(s) = (8, 3) or (— 1, —1) if g(s) = (1, 1, 1 
and for some 7, h(s + e,) = (1, 1). This can occur; the: 


1960 


she (1, 1) indicates not orbit 5.1, which it would if ¢(s) 
iad been equal to (2, 0, 1), but rather orbit 5.2. We have, 
symbolically, (2, 1, 1) = (1, 0, 2) which includes (3, 0, 0) = 
0, 1, 3), which is certainly possible. Naturally, the 
lecoder could not know at this point that ¢(s) = (1, 1, 1) 
rather than (2, 0, 1), although it could avoid the difficulty 
dy searching for h,(s) = (1, 9) first. However, the decoder 
erforms correctly without the preliminary search, for it 
would not find j with h; (s + e;) = (— 1, —9). Indeed, in 
order to do so we should have to have (8, 1, 1) = (2, 0,2) = 
(0, 0, 1), and the last equation is equivalent to (2, 0, 0) = 
(0, 0, 3) or (2, 0, 0) = (0, 0, 1), which are impossible. The 
other facts used in the procedure may be established in a 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


459 


BIBLIOGRAPHY 


[1] W. Burnside, “Theory of Groups of Finite Order,’ Dover Publi- 
cations, Inc., New York, N. Y.; 1955. 

[2] M. J. E. Golay, “Notes on digital coding,” Proc. IRE, vol. 37, 
p. 657; June, 1949. 

[3] M. Hall, “Projective Planes and Related Topics,’’ California 
Institute of Technology, Pasadena; April, 1954. 

[4] L. J. Paige, “‘A note on the Mathieu groups,” Can. J. Math., 
vol. 9, pp. 15-18; January, 1956. 

[5] E. Prange, “Cyclic Error-Correcting Codes in Two Symbols,”’ 
AFCRC-TN-57-103, ASTIA Document No. AD 133749; Sep- 
tember, 1957. 

[6] E. Prange, “Some Cyclic Error-Correcting Codes with Simple 
Decoding Algorithms,’’ AFCRC-TN-58-156, ASTIA Document 
No. AD 152386; April, 1958. 

[7] D. Slepian, “A class of binary signaling alphabets,” Bell Sys. 
Tech. J., vol. 35, pp. 203-234; January, 1956. 

[8] E. Prange, “The Use of Coset Equivalence in the Analysis and 
Decoding of Group Codes,’”? AFCRC-TR-59-164; June, 1959. 


Encoding and Error-Correction Procedures 


for the Bose-Chaudhuri Codes* 


W. W. PETERSON, MEMBER, IRE 


Summary—Bose and Ray-Chaudhuri have recently described 
a class of binary codes which for arbitrary m and ¢ are f-error 
correcting and have length 2” — 1 of which no more than mt digits 
are redundancy. This paper describes a simple error-correction 
procedure for these codes. Their cyclic structure is demonstrated 
and methods of exploiting it to implement the coding and correction 
procedure using shift registers are outlined. Closer bounds on the 
number of redundancy digits are derived. 


INTRODUCTION 


OSE and Chaudhuri* have recently discovered a 
B new class of codes with some remarkable properties. 
For any positive integers m and 1, there is a code 

in this class that consists of blocks of length 2” — 1, that 
corrects ¢ errors, and that requires no more than mt parity 
check digits. Thus, the codes cover a wide range in rate 


* Received by the PGIT, December 6, 1959. Part of this work 
was supported by the U. S. Army Signal Corps, the U. 8. Air Force 
Office of Scientific Research, Air Research and Development Com- 
mand, and the U. 8. Navy Office of Naval Research at the Research 
Laboratory of Electronics, Mass. Inst. Tech., Cambridge, Mass.; 
and part of the work was done at the IBM Research Lab., Yorktown, 
N 


il On leave from the University of Florida, Gainesville. Presently 
at the Dept. of Elec. Engrg. and Res. Lab. of Electronics, Mass. 
Inst. Tech., Cambridge, Mass. ; 

1R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error- 
correcting binary group codes,’ to be published in Information 
and Control. 


and error-correcting ability, unlike most other known 
classes of codes.” These codes are a generalization of the 
Hamming codes;’ the case ¢ = 1 gives the Hamming code 
in each case. 

In this paper two important properties of these codes 
are described. First, a method for error correction is 
described which is a generalization of the simple error- 
correction procedure that can be used with Hamming 
codes. The procedure requires a number of operations 
which increases only as a small power of the length of the 
codes. 

Second, it is shown that these are cyclic codes* and, 


2 The only others of which I am aware are I. 8. Reed, “A class 
of multiple-error-correcting codes and decoding scheme,’ IRE 
TRANS. ON INFORMATION THEORY, vol. IT-4, pp. 38-49, September, 
1954; P. Elias, ‘Error free coding,” IRE Trans. on INFORMATION 
Tunory, vol. IT-4, pp. 29-37, September, 1954; and I. S. Reed and 
G. Solomon, ‘Polynomial code,” to be published in J. Soc. Ind. 
Appl. Math. 

3 R. W. Hamming, ‘Error detecting and error correcting codes,’’ 
Bell Sys. Tech. J., vol. 29, pp. 147-160; April, 1950. 

4H, Prange, ‘Some Cyclic Error-Correcting Codes with Simple 
Decoding Algorithms,’’ Air Force Cambridge Research Center, 
Bedford, Mass., Tech. Note AFCRC-TN-58-156, April, 1958; 
“Cyclic Error-Correcting Codes in Two Symbols,” Air Force Cam- 
bridge Research Center, Bedford, Mass., Tech. Note ARCRC-TN- 
57-103, September, 1957; “The Use of Coset. Equivalence in the 
Analysis and Decoding of Group Codes,”’ Air Force Cambridge Re- 
search Center, Bedford, Mass., Tech. Rept. AFCRC-TR-59-164, 
June, 1959. 


460 


therefore, the encoding can be accomplished very efficiently 
with a shift register. The theory of the cyclic structure 
also provides a closer bound on the number of parity 
checks required to correct a given number of errors. 


CoNSTRUCTION OF THE Bose-CHAUDHURI CoDES 


Given an irreducible polynomial p(X) of degree m with 
1 and 0 as coefficients, a representation of the Galois 
Field with 2” elements GF'(2”) can be formed. It consists 
of all polynomials of degree m — 1 or less. They can be 
added (modulo 2) term by term in the ordinary way. The 
rule for multiplication is to multiply in the ordinary way, 
reducing the answer modulo 2 and modulo p(X) to a 
polynomial of degree m — 1 or less. (That is, consider 
p(X) = 0, and use this equation to eliminate terms of 
power greater than m — 1.) It can be shown then that 
certain of these polynomials, called primitive elements, 
have the property that the first 2” — 1 powers of such an 
element are exactly all the 2” — 1 nonzero field elements. 
Also, every nonzero field element is a root of the equation 


Kea el 


and conversely. Thus if a is any element of the field, 
=| 2Qm—2 
a —— 1 ° 
The field elements can also be thought of as vectors 
whose components are the coefficients of the polynomials. 
The sum of two vectors corresponds to the sum of the 
corresponding polynomials. 
The Bose-Chaudhuri codes are described by giving the 
matrix of parity check rules, which is the matrix 


1 1 1 
3 21-1 
a a a 
M= | a (1) 


ita y= Cee Nee 
where a is a primitive element of the field. 

This is a 2” — 1 X ¢ matrix of GF(2”) elements, but 
thinking of each field element as a vector of m binary 
digits, this is a 2” — 1 X mt matrix of binary digits. A 
vector of 2” — 1 binary digits is considered a code word 
if it satisfies the parity check described by each column; 
z.e., if the product of this vector with the matrix is zero. 
In other words the set of all code words is the (left) null 
space of this matrix. 

The code that Bose and Ray-Chaudhuri use as an 
example will be used to illustrate the ideas discussed in 
this paper. Let a denote a root of the equation X* = X +1. 
This happens to be a primitive element of the field. Then 
the 15 nonzero field elements are given in Table I. 

Taking t = 3, the following matrix of parity check 
rules results: 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


TABLE I 
REPRESENTATION OF GF'(2*) 
wi = (1000) 
a a = (0100) 
oe) = rid = (Ox 11,0) 
a = a’ = (00 01) 
at =lt+a = (1100) 
Ca a + a = (0110) 
a a2 + a3 = (001 1) 
a’ =l+a +3 = (1101) 
Sean Ware = (1010) 
a = a a = (0101) 
a =l+a+a’ = ((estk 1) 
a= a+te+toa? = (0111) 
a =1Ltatoa+t+a? = (1111) 
os i || +o + a = (1011) 
ot = 1 +a? = (1001) 

re 
KO: 020/44:00) 07 sen ose 
O71" 0.0) (050: 0 he Dein) 
0:0. 5:0 20-040 1S Taro 
0: OOD. Os trOas a be OlGas 
1.1020 IE sh) Oe at) 
OF OFS TOSON sae are 
0.0.13 OOO Oe 
ly 2 0-1 Ore eee (2) 
be Onk> O 2001 O0CRr tie Tee 
OL OSS ie ais ee ORO RO 
1.1 1.0 1.0.0.0, Osea 
0.1 1-30 Ot0c Sea 
1 Tt POS Pear uaa 
1 OFLU OM ROTTS Om iteiea() 
1.0 0 ble hee Die 


Of these twelve columns, the last one is trivial and the 
next to last is a duplicate; these two can be dropped. The | 
rest are independent, and the result is a code with fifteen 
digit code words of which ten are parity checks and five 
are information places. The code corrects all triple errors, 


AN ERror-CorrECTION PROCEDURE 


Consider the result of multiplying a vector (79, 71, 72, «++ , 
r,-1) of nm = 2” — 1 components by the matrix M in 
(1). The result is a vector of ¢ Galois field elements. The 
first component is 


tm tra tre”? + +++ +r, 10" 


= r(a) 
where 
r(X) =rmtnax +--+ pox rn—1 


is the polynomial which corresponds naturally to the 
given vector. (In what follows no distinction will be made 


1960 


between a vector and the corresponding polynomial.) 
The other components are clearly r(a’*), r(a*), «++ , r(a*‘"1). 

In these terms an equivalent definition of the Bose- 
Chandhuri codes can be given. A vector is a code word 
if it is in the left null space of M, 7.e., if the parity checks 
r(a), r(a’), r(a’), --+ , r(a’**0) are zero. This can be 
restated as follows: 

Definition: A polynomial s(X) is a code vector for a 
terror correcting Bose-Chaudhuri code if, and only if, 
a,a,-+- , a’'” are roots of s(X). 

The first step in devising a decoding method is to 
characterize the information contained in the parity 
check calculation for a received vector which may contain 
errors. Let ¢ = (€, €:, --* , @,-1) be the vector of errors, 


1.e., if the errors occur in the positions 2,, 7,, --+ , 7,, then 
Ca 1 for 2 = Osi, Bor ride S ite 
é; = 0 otherwise. 


There is a one to one correspondence between the elements 
of the error vector and the elements of GF(2”) which 
constitute the first column of the parity check matrix M 
given by (1), e; corresponding to the element a’ occurring in 
the 7-th position in the first column of M. The elements 


X,, X:, --: , X, of GF(2”) which correspond in this way 
t0 ¢;,,€:,, °** , é;, may be called the error position numbers. 
mets X, = a (7 = 1, 2,.--- , 2). 


Lemma 1: If a received vector r has errors in digits 
numbered X,, X,, --- , X,, then the parity check vector 
r X M is of the form (S,, S3, Ss, +++ , Sor-1) where 

S; = ys Xj. (3) 
t=1 

Proof: Assume that the vector s was transmitted, and 
r = s + e received, where e has ones in the positions 7,, 


Zz, **- , 2, and zeros in all other positions. In terms of 
corresponding polynomials, 


W(X) = s(X) + e(X) 
and the result of the parity check calculation is 
[r(a), r(a’), reek A) ror =) 


But s(a) = s(a*) = --- = s(@’, ) = 0, so'that r(a) = 
s(a) + e(a) = e(a), r(a*®) = e(a*), etc. Thus, the result 
of the parity check calculation is [e(a), e(a’), «+: , 
e(a’’~’)]. But 


e(a’) = + ea! ao ena? to... be 
Se a XO. ,D. 
a i=1 


It is interesting to note that for ¢ = 1, if the error 
occurs, for example, in the component numbered X,, then 
the result of the parity check calculation is exactly S; = X, 
which is the Galois field binary code for the error position 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


461 


number. This is exactly analogous to the method of error- 
correction for Hamming codes in which the parity check 
calculation gives the ordinary binary code for the position 
of the error. In this sense the Bose-Chaudhuri codes for 
t = 1 are equivalent to the Hamming single-error correct- 
ing code. 

The S; are the power sum symmetric functions.’ Thus 
the parity checks give the first ¢ odd power sum sym- 
metric functions. The first ¢ even ones can be found from 
the fact that modulo 2, (a + b)” = a’ + b’, and hence 


» 2 v 


a=1 


(4) 


Similarly,oS2 — 9), Se = S., ete, 
Suppose that there are ¢ errors. Then the error position 


numbers X, --: X, satisfy the equations 


S; = Dax ig 18 tak 
i=1 


This is a set of ¢ equations in ¢t unknowns, the X;. The 
solution would tell the positions of the errors. It appears 
impossible to solve the equations by any direct method, 
and trying all combinations of ¢ of the 2” — 1 field elements 
would require too many computations. There is, however, 
an interesting compromise. 

The elementary symmetric functions o; are related to 
the power sum symmetric functions S, by Newton’s 
identities:° 


8;—o=0 

S2 — S101 + 20, = 0 

Ss — S2o1 + Sic, — 303 = 0 

S. — S30, + Soo2 — Sio3 + 40, = 0 

S; — Sigi + S302 — S203 + Sio, — 505 = O 


(5) 


etc. 


If it is possible to solve Newton’s identities for the 
elementary symmetric functions o;, the error position — 
numbers must satisfy the equation 


x — he. Ge + TS Glee C00 tae CO; 


= (X — X,)(X — Xo) ++: (X — X,) = 0. (6) 


Eq. (6) can be solved effectively by merely substituting 
each of the n = 2” — 1 field elements into the equation. 
Yor each digit in the received vector, the corresponding 
GF(2”) element is substituted in the equation. If the 
equation is satisfied, this bit is wrong and must be changed. 
If the equation is not satisfied, the bit is correct. 


®’ See, for example, van der Waerden, footnote 8; J. Riordan, “An 
Introduction to Combinatorial Analysis,” John Wiley and Sons, 
Inc., New York, N. Y., 1958; T. Muir and W. H. Metzler, “A 
Treatise on the Theory of Determinants,” ch. 21, 1930; or any 
book on the Theory of Equations. 


462 


The proof that it is indeed possible to solve for the 
ordinary symmetric functions from the power sum 
symmetric functions is given by the following theorem:° 

Theorem 1: The k X k matrix 


rey S, 1 0 0 
Wee S, S3 Ss S, 0 
Sop—4 Sor-s Supee Sap-7 S;-3 
Sop—2 Sap-a Sopns Sayan Spas 


is nonsingular if power sum symmetric functions S; are 
power sums of k or k — 1 distinct field elements, and is 
singular if the S; are power sums of fewer than k — 1 
distinct field elements. 

The proof requires the following two lemmas: 

Lemma 2: If the S; are power sums of v < k — 2 distinct 
field elements, M;, is singular. 


Proof: 

0 0 

1 0 

M,| o, |= 1|0 

0 

0 

0 

Op-2 0 


by Newton’s identities, (5), and thus 1/7, has a nontrivial 
null space and must be singular. Q.E.D. 

Lemma 3: If the S; are power sums of & indeterminants 
X,, -:: , X,, then the determinant 


| M, | = T] &%, + X)). 


Proof: If X; = X;, all of the power sums contain two 
identical terms, which cancel because the field has 
characteristic 2 (7.e., 2 = 0). Then it is just as if there 
were no more than k — 2 distinct elements used in form- 
ing the power sums, and, by Lemma 2, the determinant 
is zero. Therefore, X; + X; is a factor of the determinant, 
for all 7 and 7, and the left-hand side must be divisible by 
the right-hand side. It is easy to check that the left-hand 
side is homogeneous of degree k(k — 1)/2, the same as 
the right-hand side, and therefore they must differ at 
most by a constant factor. 

To determine the constant factor, a single special case 
suffices. If k is odd, let the X,; be the roots of the equation 


xX*—1=0. 


6 Similar results for a real field appear, for example, in H. O. 
Faulkes, ‘““‘Theorems of Kakeya and Polya on Power sums,” Math. 
Z., vol. 65, pp. 345-352; 1956. 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 
Then 

> Xi = 8.= 0 at 3. = 0. moder, 
1 pix j= 0 ody ee 


I 


I 


There will be exactly one 1 in each row and each column 
and it follows that |M,| = 1 in this case. For k even, 
letting the X; be all of the roots of the equation 


Xe MS 


gives the same result. The constant factor, which could 
be only 0 or 1, must be 1. 

Now Theorem 1 follows from the fact that if the de- 
terminant |M,| is zero it must be that some X; = X;. 
Since all of the nonzero X; are distinct, X; = X; = 0, 
and there were fewer than k — 1 errors. Q.E.D. 

If there are actually ¢ — 1 errors, it can be seen from 
Newton’s identities, Cramer’s Rule and Theorem 1 that 
the solution for the o’s will yield «, = 0. The correspond- 
ing polynomial equation will have zero as one root. 

Now let us review the error-correcting procedure. The 
t-error correcting Bose-Chaudhuri codes give, as the 
parity checks on received sequences, the odd power-sum 
symmetric functions up to S,,-, and the intermediate 
even functions can be calculated simply from these. If 
it is assumed that no more than ¢ errors occur, then by| 
Theorem 1, with k = t, it is either possible to solve for) 
the error position numbers, or there are t — 2 or fewer| 
errors. In the latter case, o,-, = 7, = 0, and two equations) 
can be dropped, giving a set of t — 2 equations in t — 2) 
unknowns to which Theorem 1 can be applied again. 
Eventually, if there were any errors at all, a set of equa-: 
tions that can be solved for thé elementary symmetric: 
functions of the error-position numbers will be found. 

The correction procedure consists of three phases: 


1) calculate the parity checks and the even numbered: 
Sa 
2) from these, calculate the elementary symmetric: 
functions o;; and ; 
3) finally, substitute each field element into the 
equation 


ee -+}- Geko +- Che Lat Fated + 7 = 0. (7) 


Those field elements which satisfy this equation corre} 
spond to error positions. 

The second step involves a certain amount of trial anc 
error because it is possible to solve the equations anc 
obtain correct solutions only when the number of equa: 
tions used equals or exceeds by one the number of error. 
that actually occur. This step might be carried out, as a 
alternative to the procedure described in the preceding 
paragraph, by starting with the assumption that twe 
errors occurred, solving, and checking the solution. If th: 
solution doesn’t check, four errors would be assumed, anq 
so forth. When a set of answers that checks occurs, i 
must be the correct solution. 


960 


i} 


If it is assumed that the length n of the code approaches 
afinity and that the number of errors corrected t is a 
xed fraction of n, the number of operations required for 
rror correction can be crudely estimated as follows. The 
rst phase, calculating parity checks, requires a number 
if operations proportional to the number of digits multi- 
lied by the number of parity checks, or no more than 
ymt operations. This quantity nmt is proportional to n? 
og n. The second phase requires solving at X t set of 
quations. The number of operations for this task is 
ypically proportional to ¢’, but it may have to be done 
/2 times. This will increase in the limit no faster than n*. 
‘nally, substituting in a tdegree polynomial requires t 
aultiplications and ¢ additions of m digit numbers, and 
nust be done n times, so that 2 tmn is a rough estimate 
# the number of operations. This again would vary as 
* log n. Thus, the total number of operations certainly 
could increase as a small power of n. 

' Consider, as an example, the code corresponding to the 
matrix in (2), which corrects triple errors. The appropriate 
‘quations are 


S; + Oi, = OF 
S3 Se Soo + Syo5 ap C30 Wis 
Ss ae S401 ae S302 ais Soo3 = 0. 


(8) 


and 


[he parity checks for the received vectors give S,, Ss3, 
ind S;. S. = S?, and S, = Sj. Solving for the o’s gives 


Cn Sis Og = (SiS; + Ss) /(S? + S;) and 
o; = (5,8, + S; + SiS; + Si)/(S: + 83), 


wrovided that S; + S; # 0. If there is only one error 
33 + S, = 0. Furthermore, if S? + S; = 0, the Newton’s 
dentities yield o; = o,02, and the equation 


(9) 


ot oy + o.X + a3 
= X+6,X° 4+ 2X 4+ ae 
(X + o,)(X? + 02) = (X + oul(X+ Vo.) = 0 


as two equal roots, which must be zero, and therefore 
here is only one error. 

As a numerical example, suppose that the vector of all 
eros is transmitted, and that errors occur in the 2nd, 
th, and 7th positions. Then 


r=010010100000000) 
Nee Ott, a1 1 10'0'0) 


I 


leferring to Table I, one finds 
aS Olli = ey =a =o = 011 1) 
nd 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


463 
Then, 
Ss +S; = (L,.0al 0) 4.0, 
GS lO malas oe. 
o2 = (SiS; + S;)/Ss + Si = (001 0)/11 01 0) 
= aja =a/a =a. 
Similarly, 


It is then easy to verify that the equation, 
Ve =o Gu oke = ee + Ope oe 0, 


is satisfied by the three values X = a, a’, and a’, and 
only these. These correspond to the errors in r. 


SoME PROPERTIES OF Cycuic CoDES AND 
Suirr REGISTER GENERATORS 


Codes for which the code points comprise a cyclic sub- 
space of vectors of zeros and ones have been studied 
recently by Prange,* and, along with theoretical results, 
he found several efficient codes that can be decoded 
easily. He has noted that the codes can be coded with the 
use of a shift-register generator.’ In this section, some of 
the theory of cyclic codes and linear recurrent sequences 
is reviewed briefly from a point of view that is especially 
well adapted to the study of the Bose-Chaudhuri codes. 

A subset C of vectors of n binary digits is called a cyclic 
subspace if it has the following two properties: 


1) If v, and vz are in C, their sum modulo 2 is also in C; 
that is, C is a subspace, or subgroup; and 

2) if vy = (a, Gy, °** |, G21) 18 INC, the vector 0). — 
(OF 15=-Ony 2, - , Q,-2) Obtained by shifting v 
cyclically one place is also in C. 


Let F,, denote the set of all polynomials 
dy + OX +-> + OX" 


of degree less than n with coefficients 1 and 0. They form 
a group under modulo 2 addition. Multiplication can be 
defined modulo X” — 1; that is, these polynomials can 
be multiplied in the ordinary way, modulo 2, and then 
reduced again to polynomials of degree less than n by 
the use of the equation X” = 1. Then R, is a ring in the 
mathematical sense. A subset J of R,, is called an zdeal’® if 
it satisfies the following two properties: 


1) J isa subgroup of F&,; and 
2) if p(X) is in J and a(X) is R,, then the product 
p(X) a(X) is in J. 


— 


7N. Zierler, “Linear recurring sequences,” J. Soc. Ind. Appl. 
Math., vol. 7, pp..31-48; March, 1959. t ‘ 

8 Galois fields and other aspects of algebra used in this paper are 
treated in many books on modern algebra. See, for example, A. A. 
Albert, “Fundamental Concepts of Modern Algebra,’’ University 
of Chicago Press, Chicago, Ill., 1956; G. Birkoff and S. MacLane, 
“A Survey of Modern Algebra,’ The Macmillan Co., New York, 
N. Y., 1953; B. L. van der Waerden, “Modern Algebra,” F. Ungar 
Publishing Co., New York, N. Y., vol. 1 and 2, 1949, 1950. 


464 


Considering polynomials p(X) = a + aX --- + 
fi, x" * to be vectors: (Gg,-a;,. <-> "5 d,-4), & cyclic sbitt 
is the same as multiplication by X modulo X” — 1. 
Therefore, every ideal is a cyclic subspace. Conversely, if 
p(X) is in a cyclic subspace C, so is Xp(X). It follows 
that X’p(X) must also be in C, and since C is a subspace, 


D ;X*p(X) = p(X) D7 ¢;:X' 


must also be in C. Thus, if p(X) is in C, so is the product 
of p(X) and any polynomial. Therefore, every cyclic 
subspace is an ideal. 

The important but well-known properties of ideals 
given in the following three lemmas and two theorems are 
proved here to make the paper self-contained. 

Lemma 4: If p(X) and q(X) are in an ideal J, the 
greatest common divisor (GCD), d(X), of p(X) and 
q(X) is in I. 

This follows directly from the fact that it is always 
possible to express the d(X) in the form 


a(X) = a(X)p(X) + b(X)q(X) 


where a(X) and b(X) are polynomials. 

Lemma 5: All polynomials in an ideal 7 are multiples 
of the unique polynomial of least degree in J. (That. is, 
every ideal is a principal ideal.) 

Proof: Let p(X) be a polynomial of least degree in J. 
Then, if g(X) is any other polynomial in J, the greatest 
common divisor of p(X) and q(X) is in J. If p(X) does not 
divide q(X), then the greatest common divisor of p(X) 
and g(X) would have lower degree than p(X), which is a 
contradiction. Therefore, every polynomial in J is divisible 
by p(X). If p,(X) and p.(X) both have minimum degree, 
each must be divisible by the other, and hence they are 
equal. 

The ideal consisting of all multiples of p(X) is denoted 
[p(X)]. The polynomial of least degree in an ideal is called 
its generator. 

Lemma 6: The generator p(X) of an ideal is a factor of 
xX” — 1, 

Proof: The GCD 
pressed in the form 


d(X) = a(X)p(X) + b(X)(x" — I) 
= a(X)p(X) mod X”—1; 


hence, d(X) is in the ideal. But p(X) is divisible by d(X), 
and since d(X) is in the ideal, d(X) is divisible by p(X). 
Hence, p(X) = d(X). 

These results can be summarized as follows: 

Theorem 2: A set of polynomials is an ideal in the ring 
of polynomials modulo X” — 1 if and only if it consists 
of all multiples of degree less than n of a factor of X” — 1. 

Corollary: If p(X) is a polynomial of degree k which 
divides into X" — 1, [p(X)] is a vector space of dimension 
n — k. 

Proof: The elements of [p(X)] are of the form c(X)p(X) 
where c(X) is an arbitrary polynomial of degree less than 
n — k. Then the n — k coefficients of c(X) are arbitrary. 


d(X) of p(X) and X” — 1 can be ex- 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


. Theorem 3: If p(X) q(X) = X" — 1, the ideals [p(X 
and [q(X)] are null spaces of each other. That is, a poly 
nomial p,(X) is in [p(X)] if, and only if, p(X) w@(X) = 
modulo (X” — 1) for every polynomial g,(X) in [¢(X) 

Proof: Since p,(X) is in [p(X)], p:(X) is a multiple ¢ 
p(X), for example, a(X) p(X). Similarly, q(X) = b(X 
q(X). Then p,(X) q(X) = a(X) b(X) (X" — 1) = 
Conversely, if p,(X) q(X) = 0, then p,(X) ¢g(X) must b 
a multiple of X” — 1, and p,(X) must be a multiple o 
(AE ge) este) 

Note that the fact that the product of two polynomial 
is zero implies that the dot product of the correspondiny 
two vectors is zero, if in one of them the order of th 
components is reversed. That is, if 


(do + aX «+> + a,X” bo + OX «+ + 0,,X" ) an 
then 
(G5, Cy. oh Mia) (Opny Opes Bo > ee) 

= (her ae Obldhmey © "+ dn-1bo = 0 


since this is the coefficient of X”* in the product of thi 
polynomials. Hence, if [p(x)] and [q(x)] are null space 
of each other, the corresponding vector-spaces are null 
spaces of each other provided that the order of component 
in the vectors of one of these is reversed. | 

Now let us consider a recursion relation (or differene 
equation) of the form | 


k 
Ghee. = 
j=0 
or 
k 
= SS a,R;-; aA = a = ile (10b: 
7=1 


The solution of these equations for given coefficients a 
will be a sequence of binary digits, {R,}. Given the digit 
Ro, +++ , Ry-1, (10) is the rule for calculations R,, the> 
R,+1, and so forth. Also, the sum of two solutions is agai’ 
a solution because the equation is linear. Therefore, th 
solutions form a vector space of dimension k. The solution 
are characterized in the following theorem. 

Theorem 4: Let p(X) = oi, a;X', ao = a, = 1, aD 
let n be the smallest integer for which X” — 1 is divisibl| 
by p(X). Let g(X) = (X" — 1)/p(X). Then the solution 
of the difference equation 


| 
k | 
Rk; = ye a;R,_; 
j=1 


are periodic of period n, and the set made up of the fire 
period of each possible solution, considered as polynomial. 
is the ideal [q(X)]. 

Proof: That any vector taken from [q(X)] is a solutio 
can be seen by multiplying a polynomial from [q(X)], fc 
example, qi(X), by p(X). The digits in the product ar 
formed by the summation in (10a), and, since the produc 
is zero, (10a) is satisfied. Therefore, any sequence forme 
by repetition of a vector taken from [q(X)] is a soluti : 


ICO 


f (10). Since g(X) = X” — 1/p(X) has degree n — f, 
hen [q(X)] has dimension k, by the corollary to Theorem 
. This is the same as the dimension of the space of 
olutions, and therefore [¢(X)] must include all solutions. 


THE Cyciic STRUCTURE OF THE 
Bosre-CHAUDHURI CopES 


It is shown in this section that the Bose-Chaudhuri 
odes are examples of cyclic codes as studied by Prange.* 
\s such they can be generated with very simple equipment, 
iS is illustrated for the (15,5) code in the next section. 
Yut of this theory also comes a better estimate of the 
tumber of parity check digits required to correct a given 
1umber of errors. 

By the alternative definition of the Bose-Chaudhuri 
odes given in the second section of this paper, a code 
onsists of all polynomials f(X) which have a, a*, ++: , 
v’~* as roots. Each element a’ of the field is a root of a 
nique irreducible polynomial p;(X) of minimum degree. 
Then f(X) must be divisible by each of the polynomials 
(XX), ps(X), --- , Do.-1(X) and, hence, by their least 
ommon multiple: 


NEG ce 


j= 


LCM [p,(X)]. (11) 
1,3, °¢°,28—1 

Since each of the factors p;(X) is irreducible, the least 
ommon multiple of the p;(X) is simply the product of 
he polynomials p;(X), with the duplicates omitted. 
Juplications are quite possible; they will occur, in fact, 
or any a’ and a’ that are roots of the same polynomial 
),(X). In other words, should a* and a’ happen to be 
oots of the same irreducible polynomial, the columns in 
he parity check matrix will be dependent, although not 
iecessarily identical. The parity checks produced by the 
olumn of powers of a’ will be satisfied if and only if the 
arity checks produced by the column of powers of a are 
atisfied, and thus one set or the other is unnecessary. 

Finally, the set of all sequences that comprise the code 
an, by Theorem 4, be generated by a recursion relation 
lefined by the polynomial X” — 1/f(X), and hence by a 
hift register generator. 

At this point it is interesting to study the limiting 
ases of the minimum and maximum numbers of parity 
hecks. It has already been noted that the nontrivial 
ninimum is the Hamming code. On the other extreme, 
he last two columns which might be included in the 
arity check matrix are powers of a” ° = a” and 
y2"-1 _ |. The last one is a root of the irreducible poly- 
ominal 1 + x and the resulting code would be the ideal 
enerated by (1 + 2”)/(1 + x). This ideal consists of the 
ero vector and the vector of all ones, so the code is the 
rivial repetition of a single information digit n = 2" — 1 
imes. If @ is a primitive element, so is a’, and therefore 
he irreducible polynomial of which a”* is a root is primi- 
ive. It can be shown then that when only the last two 
olumns, corresponding to a and 1, are omitted from the 


® See, for example, Birkhoff and MacLane, op. cit., p. 396. 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


465 


parity check matrix, the resulting code consists of a 
maximal length sequence, all its shifts, all complements, 
and a sequence of all 1’s, which is then the code studied 
by San Soucie and Green.”® This code can also be shown 
to be equivalent to the Reed-Muller first-order code with 
any one digit dropped.”* 

It is possible to predict easily which powers of a are 
roots of the same polynomial, and thus, incidentally, find 
the degree of the polynomial of which a’ is a root. The 
method is based on the fact that if a is a root of f(X), 
then a’ is also, since f(a’) = [f(a)]? = 0. It turns out that 
a,a,a,a, «++ are, in fact, all of the roots. In Table II 
information is given for m = 4 and 5. Note that in the 
first case, a’? = 1; and in the second, a*' = 1. 

The code for m = 4, t = 3 has for its generator, by 
(11), 


f(X) = p(X)ps(X)ps(X) 
and therefore has 4 + 4 + 2 = 10 parity checks, and 5 
information places. The code for m = 5, t = 5 has 
{(X) = p(X)ps(X)ps(X) p(X) 


for its generator, and therefore has 20 parity checks. All 
codes for m = 4 and 5 are listed in Table III. 


TABLE II 
Roots or Potynomiats p;(X) 


Polynomial Roots 
m=4 p(X) a, a’, at, a8 
ps X ) a, a, or, ad - 
ps X ) a, 0 (a? — a?) 
pi X) al, a4, als, gil 
i) = 2) Fe Y) @, as eee 
Dts} A 
pit X) Bi. alt. os. a, qd 
paX) = p(X) 
Dis all, g2, 18, 926 gl 
pis X) = pii(X) 
Dis( a5, 980, 929 @27 a8 


TABLE III 
Rate AND HRROR CorRRECTION ABILITY OF BOSE- 
CHAUDHURI CoDES FOR m = 4 AND 5 


Length Number of Number of Number of 
of Code Parity Information Errors 
Words Checks Places Corrected 

n lak k t 

15 4 11 1 

15 8 7 2 

15 10 5 3 

31 5 26 i 

31 10 21 2 

31 15 16 3 

31 20 11 5 

31 25 6 if 


10 J, H. Green, Jr. and R. L. San Soucie, “An error-correcting 
encoder and decoder of high efficiency,’ Proc. IRE, vol. 46, pp. 
1741-1744; October, 1958. 

uN. Zierler, “On a variation of the first-order Reed-Muller 
Codes,”’ Lincoln Laboratory Group Rept. 34-80; October, 1958. 


466 


Code parameters for some larger codes were calculated 
on the IBM 704 computer. The results are plotted in Fig. 
1. The vertical axis represents rate (percentage of all 
digits available for information), and the horizontal axis 
represents the number of errors correctable as a percentage 
of the total number of digits. The dashed curve represents 
asymptotic values of a lower bound on the rate of the best 
code that corrects errors in a given percentage of the 
digits..” The curves drawn for the Bose-Chaudhuri codes 
for large n fall below the bound for the best code. In fact, 
it is shown in the Appendix that they approach zero as 
the length of the code increases indefinitely. This may 
mean that these codes are truly not optimum, or it may 
mean that the number of errors correctable by the pro- 
cedure given in this paper is not the total number of errors 
correctable by Bose-Chaudhuri codes in the case of very 
long codes.” 

The polynomial p(X) can be any primitive polynomial 
of degree m. The other polynomials p;(X) are determined 


n TOTAL NUMBER 


z OF CODES 

= 09 

E 31 5 

<6 

= 255 33 
Cc 

OS O8- 1023 105 

es 8191 629 

oe) Orat= 

= 65535. 413 

a 

=° 06+ 

=) 

= 


TRANSMISSION RATE 

DIGITS)/( TOTAL NUMBER OF DIGITS) 
(2) 
TS 
T 


n= 65535 


06025 005 0075 O10 0125 0150 0175 020 0225 025 


ERROR CORRECTION RATE = (MULTIPLICITY OF 
CORRECTABLE ERRORS/ TOTAL NUMBER OF DIGITS) = 


t/n 


Fig. 1—Error correction and rate for some long Bose-Chaudhuri 
codes. (Dashed curve is asymptotic lower bound for the rate for 
the best binary code as given by Gilbert. ) 


2H. N. Gilbert, “A comparison of signaling alphabets,” Bell 
Sys. Tech. J., vol. 31, pp. 504-522; May, 1952. 

13] have found with the aid of the IBM 704 that the Bose- 
Chaudhuri two-error correcting codes for m = 4 and 5 correct 
some triple errors and nothing beyond and are therefore optimum. 
The three-error correcting code for m = 4 corrects 420 quadruple 
and 28 quintuple error patterns and is 3 optimum. The three-error 
correcting code for m = S corrects 13,020 quadruples and 14,756 
quintuples and nothing beyond—this seems good but has not been 
proved optimum. (See A. B. Fontaine and W. W. Peterson, “Group 
code equivalence and optimum codes,” IRE Trans. on INForMa- 
tion TueEory, vol. IT-5, pp. 60-70; "May, 1959.) Thus, any non- 
optimum behavior of these codes occurs only in codes so ‘large that 
they are difficult to analyze by looking at code words themselves 
or searching for coset leaders even with the aid of a computer. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe. 


by the particular choice of p(X), and the question arise 
as to how they may be calculated. One simple method i 
based on the fact that every element of GF(2”) is a roo 
of the polynomial X*”~* — 1. Therefore, each element i 
a root of one of the factors of X°”"~* — 1. One needs only 
to factor this polynomial and test to see which factor ha: 
X? as a root. The following alternative method is useful 
It has been noted that the degree m; of p;(X) can be 
easily determined. Then if 


p(X) = Qo a a,X bt oT -L Gok a ae 


since a’ is a root of p;(X), 


0 = aa’ + aa’ +--+ +a, ) +a”, 


and if a’ is written as a vector with m components, the 
resulting set of linear equations can be solved for the 
coefficients a; of p;(X). | 

There is also an explicit formula 


p(X'”*\p(aX'”*) pai *X"/") 


where @ is a primitive jth root of unity. It can be showr 
that when the multiplication is carried out only integra 
powers of X remain, and these have only ones or zeros as 
coefficients. | 

Consider again the sample code discussed by Bose anc 


Chaudhuri. The irreducible factors of X" — 1 are 


X* —1=(X — 1)? 4+ X +:1)(X* 4+ X* 4+ XY? 
+X + W(X + X* +4 (Xt +X + H 


A root of the last factor was taken as a; and thus 


DX) =X eX ak 


Then a” satisfies the equation X° — 1 = 0, since a = 
But X* — 1 = (X — 1) (X%*4+X%°4 X74 X +1), am 
since a® is not a root of the first factor, it must be a ro 
of the second. Similarly, a’ satisfies X*° — 1 = 0 
(X — 1) (X? + X 4+ 1), and so a’ isa root of X? +X+9 
The fact that this has degree 2 ties i ne with the observatio: 
that the column of powers of a contained only tw 
independent parity checks. | 
All code points must be multiples, then, of 


{(X) = pXp(X)ps(X) | 
(b+ X + X9)0 Xe? Ee ee 


(1. "X eae 
= 1+ X + KP + Ke Pee 
=(Lib0110.0:1.04.0/0s0.0% (te 


and it can easily be checked that this vector, any cycl| 
permutation of it, and any sum of permutations, actual] 
do satisfy the parity checks defined by the matrix M in (2 


| 


1960 


MECHANIZING THE CopING AND Error-CoRRECTION 


Shift registers with feedback corrections can be used in 
a number of ways in mechanizing coding and error- 
correction procedures. The following uses will be discussed 


in this section: 


1) coding using a shift register with one stage for each 
information digit in the code, 

2) coding using a shift register with one stage for each 
parity check digit in the code, 

3) counting in the Galois field code, 


4) multiplying and dividing Galois field elements, and 


5) calculating parity checks on received vectors. 


Both the methods of coding apply to any cyclic code. 


: The methods will be illustrated using the Bose-Chaudhuri 


(15, 5) code described by the matrix M in (2). 

Every cyclic code is an ideal generated by some poly- 
nomial f(X), z.e., a polynomial is a code vector if and only 
if it is divisible by f(X). This means that, by Theorem 4, 
a vector is a code vector if and only if it satisfies the 
recursion relation corresponding to the polynomial 
(X” — 1)/f(X). For the code used as an example, by (12), 


f(X)=14+xX4+ X74 X*4+ X°>4 X84 xX” 
DFO lp Xi Kosh X 
Then every sequence satisfying the recursion relation 
R; = Ry-1 + R;-5 + B;-s 


is a code point, and conversely. Such sequences can be 
generated by putting information digits in the shift 
register generator shown in Fig. 2 and shifting 15 times. 
The first five digits coming out will be information digits, 
and the next ten digits will be a set of parity checks which 
make the whole sequence a code point. The symbols come 
out of this encoder low order digits first. The order can 
be reversed by reversing the order of the shift register 
feedback connections. 


Fig. 2—A shift register for encoding the Bose-Chaudhuri (15,5) code. 


A second method of coding is based again on the fact 
that the coded vector must be, considered as a poly- 
nomial, a multiple of f(X). Let t)(X) be a polynomial in 
which the & coefficients of the terms involving X"”’ 
through X”* are arbitrary information digits, and the 


- coefficients of lower order terms are zero. This corresponds 


to a vector in which the first n — k components are zero, 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


467 


the last k digits arbitrary information digits. Then t)(X) 
can be divided by f(X) to produce a quotient and a re- 
mainder 


t(X) = f(X)q(X) + r(x), 


where r(X) has degree less than (n — k), which is the 
degree of f(X). Then 


fo(X) + r(X) = f(X)q(X) 


and, hence, f)(X) + r(X) is a code point. But r(X) corre- 
sponds to a vector in which all components except the 
first n — k are zero, since r(X) has degree less than n — k. 
Thus, the sum consists of n — k check digits, the co- 
efficients of r(X), and k information digits, the coefficients 
of t)(X). 

The next problem is to calculate r(X). In general, the 
calculation of the remainder after division by a poly- 
nomial can be accomplished with a shift register. The 
method is illustrated in Fig. 3(a). Assuming the divisor is 
the f(X) for the code used in the example, 7.e., 1 + X + 
xX’? + X* 4+ X° 4+ X* + X", the operation of the circuit 
can be understood as follows: The answer is the same as 
results from reducing the dividend modulo f(X). This 
means that the dividend polynomial should be reduced to 
a, polynomial of degree less than 10 using the relation 


(13) 


XMalt+xX4tX4+ X°4+ KX 4 X*. 


(a) 
ke | ee eee xf xe x7] ee x? 
INPUT 
(b) 


Fig. 3—Shift register for calculating residues modulo f(X) = 
14+ X¥ + X24 X44+ X5 + X8 + X10, (a) Basic circuit; (b) basic 
circuit with automatic premultiplication by X"°. 


Now assume that a single one is shifted into the low-order 
position and then shifted right a number of times. Think- 
ing of the contents of the register as a polynomial with 
low order digits at the left, each shift corresponds to 
multiplying by X, at least until a shift out of the high- 
order position. A one in the high-order position corre- 
sponds to X°, and shifting it out makes it X’°. This 
results in the circuit in adding into the lower order positions 
the equivalent of X"° given in (13), and, hence, in this 
case the shift still corresponds to multiplying by X and 
modulo f(X). Thus, successive shifts give successive 
powers of X modulo f(X). 


468 


Now this is a linear device, and a polynomial (which 
is the sum of powers of X) can be reduced modulo f(X) 
by shifting it into the device, high power terms first, 
until the constant term is shifted into the low-order 
position. 

In using this device for calculating the r(X) in (17), the 
modification shown in Fig. 3(b) can be made to avoid the 
last n — k shifts which would add n — &k zeros into the 
low-order positions. It amounts to multiplying the input 
digits by X""* = X”° before adding. 

The procedure for coding is then to shift all the infor- 
mation digits into the device in Fig. 3(a) or 3(b). If the 
device in Fig. 3(a) is used, » — k more shifts must be 
made with no input. Then the correct check digits remain 
in the register and should simply follow the information 
digits, high order digits first, to make a complete code 
vector. Note that the number of stages in this shift 
register is » — k, while the shift register shown in Fig. 2 
has k stages. 

A counter which counts in terms of Galois field elements 
is shown in Fig. 4(a). It works on the same principle as the 
device shown in Fig. 3(a), but using the primitive poly- 
nomial p(X) = X* + X + 1 of which a isa root. If a 1 is 
placed in the low-order position, successive shifts give 
successive powers of a using the relation a* = a + 1, and 
these are exactly the representations of GF(2*) elements 
given in Table I. 


INPUT 


(b) 


Fig. 4—Galois field counters for G'(2'). (a) Increasing powers of a; 
and (b) decreasing powers of a. 


In the device shown in Fig. 3(b), a left shift corre- 
sponds to division by a and a 1 shifted out of the low 
order end a’ is replaced by its equivalent 1 + a*. Thus, 
this device can count down, or give Galois field elements 
in reverse order. A multiplier can be mechanized by 
putting one factor in a device A like that shown in Fig. 
3(a), the other in a device B like that shown in Fig. 3(b). 
Then both devices are shifted until the code for 1 appears 
in device B. The product then appears in A. Division can 
be done in an anologous manner. Multiplication can also 
be done in a manner analogous to that used in digital 
computers with a shift register such as that shown in 
Fig. 3(a) used in place of an accumulator. 

The parity checks corresponding to the first column of 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


Galois field elements in the matrix M of (2) correspond to 
the Galois field representation of 


ra) = 1% + ra + re 


Ao RAs mt 
This can be calculated by using the relation a* + a + 
1 = 0 to eliminate terms of degree higher than 3 in a. 
This, in turn, is exactly what will result if the vector 
(ro, T1, *** 4 2M) is shifted into the shift register shown 
in Fig. 3(a) high-order digits first. Note that shifting 
fifteen times multiplies by a’’, but a’ = 1. Similarly, the 
device in Fig. 3(b) could be used with the low-order digits 
entering first. 

Calculation of the other parity checks is slightly more 
complicated. It requires calculating r(a’) for the first t 
odd values of 7. The first step is to devise a shift register 
which automatically multiplies by a’. The example j = 5 
should make the principles clear. Note that 


lia =e? =ata’ 
aw =a =a +a’ 

5 a 3 
Qa =a = 1 “aaa 
eo =a =1lt+a’, 


so that 
a(dy + aia + aa” + aza°) 
= a(a + a’) + a(a’ + a’) 
+ a,(1 + a + a*) + a;(1 + a’) 
= (do + as) + (ao + ara 
+ (ao + a + aga? + (a, + apa? 


Thus, the new value of ap is the old a, + as, the new a, 
is the old a) + as, etc. A shift register with feedback 
connections shown in Fig. 5 will give this result. Then, if 
the received vector (ro, 7: , +++ , 72>) is shifted into this 
device, after fifteen shifts the result r(e°) will remain in | 
the register. 


Fig. 5—A circuit for calculating the parity checks (a5). 


CoNCLUSION 


Relatively simple coding and error-correcting methods 
have been described for the Bose-Chaudhuri codes. The 
study of coding and error-correction methods for these 
codes gives additional insight into the remarkable struc- 
ture of the codes. 


1960 


APPENDIX 


A bound on the rate of Bose-Chaudhuri codes which 
correct ¢ = 2° errors is derived in this Appendix, and it is 
shown on the basis of this bound that if ¢ is made a fixed 
fraction of n, the number of digits in the code, the rate 
must approach zero as n increases indefinitely. 

This problem is purely number-theoretic, and can be 
formulated as follows: The quantity to be studied is the 
rate, which is the quotient of the number & of information 
digits and n = 2” — 1, the total number of digits. Since 
there is one independent parity check for each distinct 
residue of j2° for 1 < j < 2t,0 <i < m, the number of 
such residues in n — k. Since 2” = 2° modulo 2” — 1, the 
condition 0 < 7 < mcan be replaced by 1 <7 < m. For 
convenience in what follows, j will be allowed to take on 
the value zero also; this adds one distinct residue. 

Let N(s) be the number of distinct residues of 72° for 
0 <j < 2¢ = 2** and m — s <i < m. Then 


n—-k=N(m—-12N6)—-1 if s<m 
and 
k=n— Nm +1 
= 2” — N(m) <2” — NG) if s<m. (14) 


An equation for N(s), valid only for s < A, will be derived 
but this will give an upper bound on k by (14). 

Consider first the residues for a particular value of 7, 
m —  <4< m. They can be arranged as follows: 


Peterson: Encoding and Error-Correction Procedures for the Bose-Chaudhuri Codes 


469 


The important facts can be seen clearly in Fig. 6 but 
are tedious to prove formally. For each 7 there are 2**' 
residues and therefore, in particular, N(1) = 2***. Two 
adjacent columns in Fig. 6 have half their residues in 
common. In particular, N(2) = 2*** + 2”. Now in adding 
the contributions to N(s) for larger values of s it is neces- 
sary to determine exactly how many residues have 
occurred in all previous columns combined. There is one 
other case which must be considered besides the previous 
adjacent column. Note that the residues and nonresidues 
of 7-2’ for a particular value of 7 fall in blocks of 2**1**~™ 
successive numbers. In determining which residues for a 
particular value of z, for example, 2), have occurred before, 
each block of 2°°** successive numbers is treated the same. 
Each will have two blocks of 2****'*~” residues. The 
first will already have been counted in the 7, + 1°° column. 
The fraction of the others to be omitted is the same as 
the fraction of blocks of length 2°°** which were counted 
as residues for 7 > 7% + m — X, which is the same as 
N(A — %)/2”. Then, since s = m — 1%, 


N(s) = Mos — 1) 4+ 2-[1 —-2°"NA — m+ 9) (15) 


for 0 < s < X. [N(s) should be considered zero for s < 0.] 


128 
2 
96 


Goeth er 
muha an 
" 


Fig. 6—Distribution of residues of 72'(m = 7, \ = 4). 


0:2', 12: 2D ya ol AO ORG ise M2 
Ome aN 0)2", Ce = 2s Cee a5 Qo" Snip (2:98°* Pets noe 
(2297-0) 2" (2-2""* + 1)2', (Di DOOM AD OT Re archer ie DU ie ae 
Oe es Oy ate 0)2", (Qs ext OG He 1)2"; Ces as Ona ae 2)2", aha, oo = 27 
In this array there are 2*"*~”** rows. Since 2” = 1, the 
array can be rewritten 
0, 2s 222%, gets (QP 1D: 
if 1+ 1-2’, 1 + 2-2', vee, IyteO7s 1 123 
2; 2+ 1-2', 2+ 2-2', are 2+ (2"-* — 1)2° 
eg ae Pre 1, Cie en 1) ae Le?" (Qaa at wie 1) all OD sna hens oat ar ii a8 (ras iad 12% 
This consists exactly of 2”~* sets of 2**'*’~” successive Now let 


numbers starting at each multiple of 2’. The arrangement 
is shown graphically in Fig. 6. 


R(s) = 1 — NG -2™. 


470 


Since N(s) includes the zero residue, the actual number of 
parity digits is at least V(s) — 1. The actual number of 
information digits is at most 2” — 1 — N(s) +1 = 2” — 
N(s). The actual rate would be at most [2” — MN(s)]/ 
(2” — 1), but for large m, this is approximately R(s). 
Then 


N(s) = 2"[1 — Ris], 


and substitution in (15) results in a difference equation 
for R(s): 


R(s) = R(s — 1) — 2-"R(s — m+ 2d) (16) 


for 0 < s. [R(s) should be considered to be 1 for s < 0.] 
Clearly, 


1S) 0= Torrall’ 3, (17) 


It follows at once from (16) and (17) that R(s) is non- 
increasing. Now if there exists e > 0 such that R(s) > ¢ 
for all s, choose any so > m — \+ (2”-*/e). Then R(so) = 
[Er(Sa) Riso. 1) bari @o == Di Felse = 2) aes 
[Rm — X + 1) — Rm — d)] + Rim — A) trivially = 
Rim — d) — 2" [R(so — m+) + R(o — m+ rA— ) 
ore UCL) Oy EG) occa A) mak 8p oe 
)e by hyoothesis < R(m — dr) — 1 by choice of so < 0 
by half of (17), contradicting the other half, and proving 
that RUS) — 0 ass — o must hold. 

Now suppose that it is required that errors be corrected 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


ina fraction 2~’ of the number of digits in a code word. 
Then 


PM to Aa a Five an 


so v & m — X. Then, taking s = A, R(A) = Rim — v) 
is an upper bound on the rate for a code with 2” — 1 
digits. As m increases this approaches zero. Since rate is a 
monotone nonincreasing function of the number of errors 
correctable and the rate approaches zero for arbitrarily 
small fractions t/n = 2°’, it must approach zero for any 
fraction t/n > 0. 


ACKNOWLEDGMENT 


I have benefited greatly from discussions with many 
people at the IBM Research Laboratory and at the 
Research Laboratory of Electronics at Massachusetts 
Institute of Technology. EK. Prange, of the Air Force 
Cambridge Research Center, J. Griesmer and J. Selfridge 
of IBM, and M. P. Schutzenberger and 8. Golomb at the 
Research Laboratory of Electronics were especially helpful. 

Most of all, I am indebted to R. C. Bose of the Uni- 
versity of North Carolina, for lecturing on his and 
Chaudhuri’s fine work so soon after it was done and for 
the very stimulating discussion we had during his visit 
to the IBM Research Laboratory in August 1959. 

Part of the computation work was done at the M.I.T. 
Computation Center. 


Synchronization of Binary Messages” 


E. N. GILBERT+ 


Summary—When messages are transmitted as blocks of binary 
digits, means of locating the beginnings of blocks are provided 
to keep the receiver in synchronism with the transmitter. Ordi- 
narily, one uses a special synchronizing symbol (which is really 
a third kind of digit, neither 0 nor 1) for this purpose. The Morse 
code letter space and the teletype start and stop pulses are examples. 
If a special synchronizing digit is not available, its function may 
be served by a short sequence of binary digits P which is placed 
as a prefix to each block. The other digits must then be constrained 
to keep the sequence P from appearing within a block. If blocks 
of N digits (including the prefix P) are used, the prefix should be 
chosen to make large the number G(JN) of different blocks which 
satisfy the constraints. Lengthening the prefix decreases the number 
of ‘‘message digits’ which remain in the block but also relaxes 
the constraints. Thus, for each N, there corresponds some optimum 
length of prefix. 


* Received by the PGIT, December 20, 1959. 
Lee Pts Res. Dept., Bell Telephone Labs., Murray 
1 2 


For each prefix P, a generating function, a recurrence formula, 
and an asymptotic formula for large N are found for G(N). Tables 
of G(N) are given for all prefixes of four digits or fewer. Among 
all prefixes P of a given length A, the one for which G(N) has the 
most rapid growth is P = 11 --- 1. However, for this choice of P, 
the table of values of G(N) starts with small values; 11 --- 1 does 
not become the best A-digit prefix until N is very large. At these 
values of N, the (A + 1) — digit prefix 11 --- 10 is still better. 
The tables suggest that, for any N, a best prefix can always be 
found in the form 11 --- 10, for suitable A. Taking P = 11 --- 10 
and A = [log, (N log, e)] it is shown that G(N) is roughly 0.35 N24, 
This result is near optimal since no choice of P can make G(N) 
exceed N 124, 


I. IntRopuctTIOoN 


HEN block coding is used, some care must be. 
taken to ensure that the transmitter and receiver | 
stay in synchronism. For example, the Morse. 
code letter spaces and the teletype stop and start pulses) 


1960 


are used to mark the beginnings of new letters. Without 
some synchronizing scheme, a receiver turned on in the 
middle of a message might start decoding in the middle 
of a letter and emit gibberish. 

In binary systems it is rarely practical to use one of the 
digits as a synchronizer. If 1 were used for this purpose, 
the only available codes’ would be 1, 10, 100, 1000, --- . 
As an alternative, one might rely on the self-synchronizing 
ability of a suitable variable-length binary encoding [3]. 
These encodings bring the receiver into synchronism 
after some delay and have the advantage that no time is 
wasted sending synchronizing information. However, the 
delay to achieve synchronism depends on the message 
being transmitted and so is somewhat unpredictable; 
occasional long delays may be encountered. 

Redundancy may be used to provide an encoding in 
which the synchronization delay is always held below a 
iixed limit. The comma-free encodings of Golomb, Welch, 
and Delbriick [5] and Golomb, Gordon, and Welch [4] 
are of this kind. These encodings have codes of fixed 
length, say N digits. The encoding is a list of N-tuples 
(codes) so chosen that, if the receiver starts to decode 
when it is out of synchronism, then it always sees an 
N-tuple which is not one of the codes in the list. After at 
most NV — 1 digit times, and at most N — 1 false starts, 
the receiver finds the correct synchronism. 

A simple example of a comma-free encoding with 
N = 5 is the list of six 5-tuples: 


01000 
OFT 050 
OF "0 
Onlet st 0: 
Ce OFF oT 
Oo 1: 


This encoding has redundancy 1 — (log.6)/5 = 0.48. It 
will be shown that very small redundancies are possible 
when N is large. All the encodings to be considered are 
comma-free encodings of a special kind called prefix 
synchronized encodings. A particular A-tuple (A < WN) 
is selected and called a synchronizing prefix. Each code 
has the synchronizing prefix as its first A digits. The 
remaining N — A digits of the codes are chosen so that, 
in an encoded message, no block of A consecutive digits 
can agree with the synchronizing prefix except blocks of 
A digits taken at the beginnings of codes. For example, if 
the synchronizing prefix is taken to be 1010, then 
10101101100 is an allowed code but not 10100101011, 
10101001101, nor 10101101110. Then the synchronizing 
prefix is closely analogous to the sync pulse of the 
teletype encoding. Since only A digits, instead of NV, need 


1The word code will be used for any one of certain strings of 
binary digits which are allowed to be transmitted. The collection 
of all codes is called an encoding. This usage permits a distinction 
which is not commonly made (as in “The code for # is a dot, in 
Morse code’’). 


Gilbert: Synchronization of Binary Messages 


471 
be remembered in determining synchronism, prefix 
synchronized encodings may be slightly easier to mechanize 
than comma-free encodings in general. 

The number G(N) of different N-tuples which such 
encodings can have depends on the choice of the prefix. 
Recurrence formulas, tables, and asymptotic formulas 
for GV) are contained herein. For fixed A, and suf- 
ficiently large N, the prefix which maximizes G(N) is 
11---1. However, if A may be varied, 11---1 is never as 
good as a suitably chosen longer prefix. Tables support 
the conjecture that G(N) is always maximized by a prefix 
of the form 11---10. All comma-free encodings have 
redundancy at least as great as (log.NV)/N. It is shown 
that this bound is approached, for large NV, by suitable 
prefix synchronized encodings. 


Il. Stare DiacRams 


Suppose that a synchronizing prefix P, consisting of A 
binary digits p,, po, --: , pa has been chosen. Each code 
is constructed by choosing » more binary digits 2,, 
Xo, *** , t, to make an N-tuple (V = A + n) 


Dig tae e: Se 


The allowed choices of xz, --- , x, are those for which no 
A consecutive digits taken from the (V + A — 2)-tuple 


» Pay U1, Xo, ° 


(po, "to *) ) Pay %1, Te, ° °° ‘ Pac) 


agree with the A-tuple P. 

These constraints may be reinterpreted graphically. 
Imagine a conceptual machine which will scan an incoming 
message digit by digit and ring a bell whenever the 
synchronizing prefix P appears. Of course, such a machine 
is easily designed using a shift register to remember the 
A most recent digits. However some of the 2“ states of 
the shift register may be merged together. A machine with 
only A + 1 states S., S,, --- , S4 may be described as 
follows. 

Let M denote the A-tuple formed by the A most recent 
message digits. If JJ = P, the machine is to be in state 
S, and the bell must ring. 

For k > 1, the machine is to be in state S, if both of the 
following conditions 1) and 2) hold: 


1) The last A — &k digits of M are the first A — k 
digits of P. 

2) For no integer k’ in 0 < k’ < k does 1) hold with k 
replaced by k’. 


> Uny Pi, * 


Thus, in state S,, at least k more digits must arrive before 
the bell can ring. For example, if P = 0101 and M@ = 1100, 
then the state is S;; the three digits 101 must follow MZ 
to produce P. More generally with P = 0101, the corre- 
spondence between M and the state of the machine is 
given in Table I. In Table I, x’s represent digits which 
may be either 0 or 1; for example, S, corresponds to 0001, 
1001, and 1101. 

State diagrams of machines for three choices of P are 
drawn in Fig. 1. The states Sp, S,, S., --: are represented 


472 
TABLE I 
State M 
So 0101 = P 
S1 x010 
Se zx01 but not 0101 
S3 either xx00 or +110 
S4 wall 


P = 0101 


Fig. 1. 


by nodes labeled 0, 1, 2, --- . The transitions which may 
occur when a new digit is received are shown as arrows. 
The labels 0, 1 on the arrows denote the value of the new 
digit which is required to cause the transition. It is not 
difficult to verify that the states So, S,, --+ so defined do 
describe a valid machine (see Appendix I for details). 

Given a state S, a path which begins at S and follows 
arrows of the state diagram may be associated with the 
sequence of binary digits which is encountered on the 
arrows. For example, in Fig. 1 with P = 0101, the path 
which starts at 3 and visits the states 2, 4, 4, 3, 3, 2, 1, 0, 4, 
in that order, is associated with the binary sequence 
111001011. This association provides a graphical way of 
stating the constraints which have been placed on the 
digits v,, --- , , of an allowed code. Starting from the 
state S, the path x,, %, -°* , %n, Di, °** » Da-1 Must never 
return to Sy. Such a return would indicate an appearance 
of the A-tuple P in the block of digits 


D2, Pa, °°° » Pa, %1, °*°* > Xn, D1, °°* » Pa-r: 
Alternatively, let a state S (S # S)) be called an end 
state if the path p,, po, --: , Pa-1, Starting at S, never 
visits So. Then the path x, --- , x,, starting at S), must 
never return to S, and must end at an end state. In 
Fig. 1 the end states appear as double circles. 


Ill. Numpers or Copgs 


The list of all possible N-tuples (codes) of the form 
(pi, °** » Day %1, °** , Zy-a) in which the 2’s satisfy the 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


requirements of Section II is a comma-free encoding. In 
this section the number G(N) of such codes will be found. 

G(N) depends not only on N but also on the A-tuple 
prefix P. Fortunately, all 2* possible prefixes do not have 
different functions G(V). Two simple symmetry trans- 
formations may be applied to a prefix P to obtain new 
prefixes which have the same G(V) as P. One symmetry 
is complementation, which replaces each digit p; of P by 
1 — p;. The second symmetry is reversal, which rewrites 
the digits of P in reverse order (p; is replaced by pa+1-i)- 
Applied to P = 01101, complementation and reversal 
produce 10010 and 10110. To see that these trans- 
formations leave G(N) unchanged, one may note that if 


digits 7,, --- , x, are allowed to follow a prefix P, then 
1 — 2a, --:,1— 2, may follow the complement of P and 
Tn, *** » &, may follow the reversal of P. Two different 


prefixes P, P’ will be called equivalent if P’ can be obtained 
by applying a complementation, or a reversal, or both 
to P. ) 
The 2* prefixes may now be collected into a smaller 
number of classes of equivalent prefixes. Since equivalent 
prefixes have the same G(N), it suffices to compute G(V) 
for one prefix from each equivalent class. Using Polya’s 
theorem [6], the number of equivalence classes of A- 
tuples turns out to be 


4(2* aft, Pola Sal lia 


where the straight brackets [ ] denote “integer part.’” 
For A = 2, 3, 4 every A-tuple is equivalent to one of the 
following: 11, 10, 111, 110, 101, 1111, 0111, 1101, 0110, 
0011, 0101. 

Some numerical results appear in Table II. The best 
prefix P [2.e., the one with the largest G(NV)] has a curious 
dependence on JN. If A is fixed at A = 3, then the choice 
P = 1101s best until V = 14, P = 101 is best for VN = 15) 

- , 19, and P = 111 is best thereafter. For any fixed 
A, the prefix 11---1 is ultimately best (this will be proved 
in Theorem 2). However, Table II shows, even for A = 4: 
that other prefixes are better until N becomes very large: 

If a best prefix is sought without fixing A, then a prefix 
11---10 with suitable A is always obtained from Table IT' 
It is conjectured that a similar result holds for any value 
of N. To test this conjecture, J. B. Kruskal extended the 
table to N < 134 and A = 6 on the IBM 704 computer: 
These computations support the conjecture. The best 
choice of A and the corresponding maximum G(N) aré 
given in Table III. This table also lists the number 
[N~* 2] which, for all P, is an upper bound on G(N) (see 
Section V). For N < 28 the tabulated values of G(N~ 
exceed 2”~°. Since prefixes with A > 6 have G(N) < 2%7® 
the values G(N) given for N < 28 are proved best 
Comparison with the extended table proves the remaining 
seven values best. The G(N) values for N > 35 were 
computed using eight place accuracy. When N > 43 the 
six-digit prefix 111110 is better than 11110 (the correspond! 
ing values of G(43) are 7.9709 X 10°° and 7.9627 < 107°) 

To compute G(N), for a particular choice of an A-tuplk 
prefix P, an Ath order recurrence was used. This ‘ 


TABLE II 
G(NV) ror DirreRENT PREFIXES 


1110 1101 
and and 
11 }10| 111 | 110 | 101 | 1111 | 1100 | 1010 | 1001 
; 

N =e The 2 

4 hg it 2 if 

| %) 2, 4 1 4 2 1 2 2 2 

6| 315 D Z 4 1 4 3 3 

~ u 5 | 6 4 12 7 Z 8 4 6 

{ 8 or eh a 20 12 4 15 9 11 

D3) | .8 13 33) 21 8 28 18 2 

HOW 2131 *9 24 54 37 15 52 32 39 

\ It | 34 |10 44 88 65 29 96 60 73 

eo. PLT 81 | 143 114 56 177 115 136 

13 | 89 /12 149) | 232 200 108 326 216 254 

| 14 |144 |13 274 | 376 Soll 208 600 405 ATA 

15 |233 |14 504 | 609 616 401 | 1104 764 885 

WG) WBS ies 927 | 986 | 1081 713 | 2031 | 1440 | 1652 

17 (610 |16 | 1705 |1596 | 1897 | 1490 | 3736 | 2710 | 3084 

18 3136 |2583 | 3329 | 2872 | 6872 | 5103 | 5757 

19 5768 |4180 | 5842 | 5536 |12640 | 9612 |10747 
| 20 10609 |6764 |10252 |10671 |23249 |18101 |20062 
Pal 20569 |42762 |34086 |37451 
22 39648 |78652 |64192 (69912 
TABLE III 
GON non JP = tial os oul’ 

N A GV) | [N25]; N | A GN) LN 424] x1075 
Bey 2 2 DE ON 12,640 0.027,6 
«la ail eae) AP 2084 23,249 0.052,4 
5 | 2or3 4 6 21 | 4 42,762 0.100 
6 3 7 10 225 82,392 0.190 
7 3 12 18 Pe 3) 158,816 0.365 
8 3 20 32 24.) 5 306,128 0.70 
9 3 33 56 Zon 590,081 1.34 
O| 3 54] 102 | 26|5| 1,137,418 2.58 
11] 4 96/ 186 | 27|5| 2,192,444 4.96 
i2| 4 i77| 8412) 28.)5 | 4,226,072 9.6 
13 4 326 630 29 | 5 8,146,016 18.5 
14 4 600 1170 30 | 5 15,701,951 35.8 
15| 4 |1,104| 2180 | 31|5| 30,266,484 69.5 
16| 4 |2031| 4096 | 32] 5] 58,340,524| 134 
ie 4 3,736 7710 33 | 5 | 112,454,976 258 
18 4 6,872 | 14563 34 | 5 | 216,763,936 505 

35 | 5 | 417,825,921 981 


currence will be derived in Theorem 1 [see (2)]. In prepa- 
ration for the theorem some notation will be explained 
here. 

Let F(n) = G(A + n). The theorem gives a formula 
[see (1)] for a generating function 


f(2) = y F(n)2”. 


Indeed, G(N) might also have been computed, with some- 
what more difficulty, by expanding f(z) [as given by (1)] 
in a power series to get the coefficient of 2’ *. Theorem 1 
will also mention some parameters Vi, --- , Va, Ti, °°: , 
T,-,. These numbers count certain kinds of paths in the 
state diagram for the prefix P. 

Let 7, be the number of binary n-tuples which describe 
paths in the state diagram starting at S, and ending at 
S,. Let V, denote the number of binary n-tuples describ- 
ing paths from S, to one of the end states. 7, and V, both 
count paths which may revisit So. The convention 7, = 1, 
V, = O will be adopted. The numbers 7, and V, are 


Gilbert: Synchronization of Binary Messages 


473 


easily found by direct enumeration. For example, taking 
P= Ol, 


T, = 1, 
Vo i 0, 


eran h() sate fen 
V, = Ph, Vs = 3, 


— iL. Wes = il. 
Vz =-6, Vn = i. 


In general, when n > A, T,, counts all n-tuples for which 
the last A digits form the prefix P; then, 


Mii Dea Mee Ar 
Likewise, 
VSS PV oes ii ee Jehe 
When 238° lis, Vio eV 4 bane beenuiound, 


G(N) may be computed with the aid of the following 
theorem. 

Theorem 1: For N > A, G(N) is the coefficient of 2~* 
in the power series of the generating function 


Hee (CLEA e eee Woe) at Vea (1) 
(= 220s ieee et Senger ca 
For N > 2A, G(N) satisfies a recurrence 
GN) +4GN -1)+--+a4GN-4)=0  @) 
where 
a fe NOP, Sis rae o 2: see ee el 
1 DT as koa. 


Proof: T,F(n — k) of the V, paths from S, to an end 
state visit S, for the last time at the kth step. It follows 
that 


n 


oS TF (n ma k) = Viet 


k=0 


Introducing generating functions ¢(2) = >> 7,2" and 
v(z) = >> V,2”, this relationship provides f(z) = v(z)/t(z). 
Simple formulas may be written for v(z) and ¢(z). For 
example, 


A 
0@) = Vet Vet + Vat + ee. 


Then, f(z) is expressed as a rational function 


so 22)(Vo gn Vos 802 Se Vie) Ee Vie 
(1 — 22\(To + Tig + --> + Ty) + 2 


f="! (3) 


which proves (1). 

Let the numerator and denominator polynomials of 
(3) be called H(z) and D(z). The denominator polynomial, 
when multiplied out, is D(z) = 1 + diz + +++ 4+ dy’. 
By (38) the function D(z)f(z) is a polynomial H(z) of 
degree < A. Setting the coefficient of 2” in D(z)f(z) equal 


to zero one finds the recurrence 
Fn) + d,Fa—1)+-:-+d.Fm—-— A)=0  &) 


for n > A, which proves (2). 


474 


For example, taking P = 1101, the parameters V; and 
T; were computed earlier. Then the theorem states that 


oo (1 — 22)(22 + 32 + 62’) + 132° 
Hors Q—2a+2)+2 
- Qe — 2” + 2' 
~ TS a2 +2? = 2* 
and 
G(N) = 2G(N — 1) — GW — 3) + GIN — 4), N> 8. 


Certain pairs of prefixes have the same set of values of 
Vi, °°: , V*, Ti, «+ , Ta-1. Then, by (1), they have the 
same G(N). This explains why the pairs 1110, 1100 and 
1101, 1001 are tabulated only once in Table II. 

If exact values beyond the range of Table II are needed 
they may be obtained by the recurrence (4). The co- 
efficients in (4) appear in the denominators of the func- 
tions f(z) listed in Table IV. 


TABLE IV 
GENERATING FunNcTIONS oF F(n) 


Prefix f(z) 
11 2/1 — 2 — 2) 
10 (22° = et — 22 + 2) 
111 1—2z2—2 — 2) 
110 (22 — iel — 22 + 2) 
101 (2 + 2)/(1 — 22 + 2 — 2) 
1111 2efl—z2—2 — 2 — 2) 
1110 and 1100 Ca vel — 2z + 24) 
1010 (22 — 2)/(1 — 22 + 2 — 223 + 2) 
1101 and 1001 (22 — 2 + 24)/(1 — 22 + 23 — 24) 


TV. Asymprotric FoRMULAS 


When vn is large F'(n) grows exponentially. The rate of 
growth is determined by the poles of f(z) which have the 
smallest absolute value. If all poles of f(z) lie outside a 
circle |z| < 1/w then F(n) is of order o(w”). Since the 
coefficients F'(n) of f(z) are positive, one of the poles of 
f(z) of smallest absolute value lies on the positive real 
axis (see [7], Chapter VII). Typically, one finds that the 
smallest positive real pole, for example, at z = 1/W, is a 
simple pole and that there are no other poles of absolute 
value 1/W. In that case, it follows from (8) that 


F(n) ~ {—#0/W)/D'/W) we (5) 


asymptotically for large n. For, under the conditions 
stated, the function 


h@) = f@ + BA/W)/\@ — 1/W) D’1/W)} 


has only the poles of f(z) other than 1/W. Since these 
poles have magnitudes greater than 1/W, the nth co- 
efficient of the series for h(z) is o(W”) and then (5) follows. 

Since poles of f(z) are zeros of the denominator poly- 
nomial D(z), W is the largest real positive root of 
D(1/W) = 0. Equivalently, (1/W) = 0 where 


A 
Z 


7 A—1 
hh Das aN iron 


(2) =14+T7e+-:-:- 


(6) 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


Since F(n) < 2", only the range W < 2, or z > 3 need 
be considered. At z2 = 3, ¢(z) has a simple pole with 
(4 + 0) = — o. For larger z, the term 2*/(1 — 2z) and 
hence also ¢(z), increases monotonically. At the value 
z= A/(2A — 2), 


2 A-—J1 


ERS amie ay 
Poe 


If A > 3, then 2 — 2/A > 4/3 so that 


(2 21+; 


A ) ai 
tos ing >1—-—(A — 1)(8/4)* > 0. 
Then, for A > 3, the real pole is a simple one and 


a1 -Ll<w <a. 


; a 


In one case, when A = 2 (P = 10), the pole occurs at 1 
and is a ferris pole; then (5) does not apply. 

It is of some interest to find those prefixes P of Lengtil| 
A which produce the largest and the smallest values of 
W. 

Theorem 2: Of all A-digit prefixes P, the one which makes 
G(N) grow most rapidly for large N is P = 11---+1. Slowest’ 
growth of G(N) for large N is obtained with any one of the 
A-— 1 prefixes 11---10, 11---100, --- , 10---0. However, | 
for all N > A + 1, the number of codes G(N) obtained with 
the A-digit prefix 11---1, 7s never as great as the number: 
obtained with the (A + 1) — digit prefix 11---10. 

Proof: In the range defined by (7), t(z) is a monotone: 
increasing function of both z and T,, --- , T4-,. Then,, 
an increase in one of the 7’, will require a decrease in the: 
value of z which satisfies t(z) = 0. It follows that the: 


largest W is obtained if T, = --- = T4_, = 1 and the: 
smallest W is obtained if 7, = --- = T,_, = Q. The) 
former case corresponds to the prefix P = 1 ---1. The: 


latter corresponds to any of the prefixes 111-- 

Lr OQueract 5, 10m). 

Although when A is fixed, the choice P = 111---1 
produces the most rapid growth of F(n) for large n, this 
choice is never the best one when N is fixed and A can 
be varied. For ADS every N-tuple which is allowed) 
for the prefix 11---1 (A ones) is also allowed for the 
prefix 11---10 (A ones and 1 zero). The converse is not 
true; for example, when A = 3 and N = 10, the 10-tuple 


1110101111 


is allowed for the prefix 1110 but not for 111. Then 11---1 
is always a better prefix than 11---1. 


As an application, let D4(z) denote the denominato 
polynomial for the prefix 11---1 (A ones): 


D@ = 1-2-2 — se 


‘ 10, ’ 


Let 1/W 4 be the smallest real positive root of D eee OL 
The denominator polynomial for the prefix 11---1 
(A — 1 ones and 1 zero) is 


1— 2+ 24 = (1 — 2) Dy_,(2); | 


1960 
then for the prefix 11---10, W = W,_,. Theorem 2 now 


shows for all prefixes of length A, that 


| Waa Wee 4; (8) 
which is an improvement over (7). 
When A > 2 all zeros of Dy(z), except 2 = 1/Wa, lie 


outside the unit circle; see Appendix II for a proof. Then, 
taking P = 11---1, not only does (5) apply but the error 
—0O0asn— ~. The integer F(n) becomes just the closest 
integer to the expression (5) when n is large. Of course, 
to use (5) in this manner W, must be computed to high 
accuracy; the recurrence (8) is still the better way of 
computing /’(n) exactly. 
Similarly, taking P = 
me 1/W4; and 2 = 
formula (A > 3) 


11---10, the two poles 
1 of f(z) supply an asymptotic 


yn+1 

A-1 = ae 1 s 
2 AWy oe A 
The dominant term in this formula is given by (5) with 
W = Wa4, E(z) = 22 — 2*, D(z) = 1 — 22 + 2%; note 
that H(1/W.4-1) = 1 since 1/W4_, is a zero of the poly- 
nomial 1 — 2z + 2* = 1 — H(z). The constant term 
1/(2 — A) is contributed by the pole at z = 1 and is 
also obtained from (5), now with W = 1. Again, the 
nearest integer to the number given by the formula is 
the exact value of F(n) when n is large. The following 
theorem gives a bound which holds for all N > A. 

Theorem 3: If the synchronizing prefix is the A-tuple 

11---10, then 


F(n) ~ 


GO ata oN A. (9) 
Proof: Write the generating function in the form 
oD = A 
OL. aera 
Ue UE iets 
li 
=-1 
77-9) D@ 
or 
1 
Ds @f2) = ae D 4-1) 
2D TELE S a Oat ae 
Equating coefficients of 2”, one finds 
Fn) =1+Fn-—-1)+---+ Fan -—A+1) (10) 


for n > A. Now the bound (9) will follow by induction 


pan. When n-= 1,---,A — 1, 
Fn) = 2" > (Wa-1)’. 
fin => A and if (9) holds for 1,2, --- ,n — 1, 


Pai tt Wanye 4 
by (10). But, since D41(1/Wa-1) = 9, 
Fn) > 14+ (Wa-)", 
which verifies (9). 


Gilbert: Synchronization of Binary Messages 


A75 
In applying (8) and (9) a series for W, is useful. 
Theorem 4: If A > 2, then 
2p 5 (na tn 2 
ec ae a 


A proof is given in Appendix ITI. 


V. RepuNDANCY ESTIMATES 


Delbriick, Golomb, Gordon and Welch [4, 5] obtain an 
upper bound on the number of codes which a comma-free 
encoding may contain. This bound is a number of equiva- 
lence classes of periodic sequences with least period NV. 
Two periodic sequences are considered equivalent if they 
differ only in phase (for example ---110110110--- and 
---101101101--- are equivalent). An exact formula for 
this number of classes is given in [4] and [5]; see also [2]. 
For our purposes, a simpler bound 


G(N) < 2"/N (12) 
suffices. This bound follows from the bound of Delbriick, 
Golomb, Gordon, and Welch when it is noted that each 
sequence of least period N is equivalent to just N — 1 
other sequences and that no more than 2” sequences have 
least period N. In Table III the best values of G(V) fall 
short of 2”/N by factors of the order of 1/2. 

From (12) it is clear that a redundancy at least as 
great as (log,V)/N is necessary for a comma-free en- 
coding using N-tuples. 

Redundancies of the same small order of magnitude 
may be achieved using the prefix synchronized encoding, 
as will be shown presently. First, however, a very simple 
example will be given in which the redundancy is roughly 
2N~”. The synchronizing prefix will be 111---1 (A 
digits) and N = A” + 1. Instead of using all N-tuples 
which satisfy the constraints of Section II, the stronger 
constraints 


AE a aaa wt I Ey teas EO OO be 0 
are imposed. Thus, the typical N-tuple has the appearance 
(for A = 4) 


11110aax0xrrx0xrxx0 


where the 2’s represent digits which are unrestricted. 
Since 2A of the A” + 1 digits are fixed, the redundancy 
is 24/(A® + 1), which equals 2(N — 1)'/N. A possible 
advantage of this encoding is that the unrestricted digits 
(x’s) may be taken directly from a given binary message 
without further encoding. 

The following theorem cites a family of prefix synchro- 
nized encodings which have redundancies approximating 
(log.V)/N. 

Theorem 5: Choose the synchronizing prefix to be P = 
11---10 wath 


A = [log. (NV log, e)]. 


476 


Then a constant C exists such that 


2-{ 1 )( ge 
GW) > 5 2 log, e ! N 


Proof: The bound will follow from (9) using a suitable 
estimate of W4_,. It follows from (11), (see Appendix IIT) 
that 


Wee Ol 2 a Ay 


zs Dg TON Slowey) 


Then the lower bound (9) becomes 


Gross aN ater ae ON loge e218) 
Let a real number a be defined by 
A = [log, (N log, e)] = log. (N/a). (14) 
Then (13) becomes 
Gy > N7'2”%ae~*{1 + O(N log, N)}. (15) 
By (14), it follows that’ 
logs e — = ma e (is) 
and this result may now be restated in the form 
er Q1log..¢6). (17) 


To derive (17) from (16), note that the function a e° 
grows monotonically from 0 at a = 0 toe’ at a = 1, 
and then decreases monotonically for a > 1. At both 
end-points of the interval (16), a e* has the value (2 
log.e)~*; then (17) is satisfied within the interval. Now 
the factor a e * in (15) may be bounded by (17) to prove 
the theorem. 

For large N, Theorem 5, shows that the upper bound 
N~*2” on G(N) can be achieved to within a constant 
factor (2 log,e)"* = 0.346. 


APPENDIX [| 


SraTE D1aGRAMSs 


The states So, Si, --: , Sa, defined in Section II have 
been obtained by merging some of the 2* states of the 
shift register machine. It remains to verify that these 
combined states are indeed states of a valid machine. 
Consider two input sequences, one ending in an A-tuple 
M, the other ending in an A-tuple M’, both of which put 
the machine in state S,. It must be shown that if both 
M and M’ are followed by the same next digit d, then the 
two sequences ending in Md and M’'d correspond to the 
same new state. 

Suppose 2 < k < A — 1. To correspond to S,, the last 
A — k digits of both M and M’ must be p,, po, --- , 
Pa-x If d = pa-zii, then S,-, is the new state following 
both M and M’. If d is not p4_,-, then the new states for 
Md and M'd are S, and Sx: where both k < K and 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


k& < K’. However the sequence of K digits which will 
lead from Md to the A-tuple P will also lead from M’a 
to P: thus K’ <.K. Similarly K-< kK Vhenvi Kk’ 
which was to be proved. The cases k = 1 and k = A may 
be handled by a similar argument. 


AppENDIX II 


ZEROS OF D(z) 


The only zero of D,(z) in the unit circle |z| < 1 is the 
real zero at 2 = 1/W,. This result is obtained by applying 
Rouché’s theorem [1] to the polynomial (1 — z)Da(z) = 
1 — 22 + 24**, At a point z = e*” on the unit circle, 


1 | 

1 | 
| 

| 

{ 

/ 


~ {1 — 2] 5 —4 cosy 


A+] 
ae 
F De 


except at the angle wu = 0(z = 1). In the neighborhood of 
2=1,sayatz=—1-—b, | 


A+1 
Z 


1 — 22 


= |1- (4 — 9b + 08 |. 


It follows that |z***/(1 — 2z)| < 1 everywhere on a 
contour C consisting of the unit circle with a small in- 
dentation to the left of the point z = 1. Then, inside C, 
1 — 2z + 2*** and 1 — 22 have the same number of ZETOS, 
namely only one. 


APPENDIX III 


FORMULA FOR W, 


W, is the real root of W = 2 — W“* in the interval 
(7). The substitution W = 2(1 — x) leads to 


al — «4)* = 2 * 


This equation will be solved as a special case w = 2°47’ 
of the equation 


a(l — x)4 = 


Using Lagrange’s inversion formula [1], a power series 
about the point w = 0 is found for z: . 


oO m-1 
a% a bi fia aa ; 
! z=0 


dz 
© | 
a eit es | 
| n n—1 | 


Formula (11) is obtained by setting w = 2°47". The 
substitution is allowable if the series for x converges a 
this value of w. To check convergence, examine the ratic 
R,, of the (n + 1)*' term to the nth in the series for 2 


. mMA+1)-1 7 nA+1 +k 


ae 


R, = 

moe 1 kao: || SeA ae 
i Apia ln T {Att ke \ 
aaa eh Bees ~ Ama + BS’ 


R, @Re = wA “(A aye 


Qe Vand, Ak 2: 


A 
| pr, < 444 (2) cy 


ind the series converges. 

In the proof of Theorem 5, an estimate is needed for 
she error made in approximating W, by the first few terms 
of the series (11). Since R, < R.., the error committed is 
10 more than (1 — R..)~* times the first neglected term. 
Tor large A, the factor (1 — R.)~* approaches 1. 


Janos: Analytic Inversion of a Class of Covariance Matrices 477 


REFERENCES 


[1] E. T. Copson, “An Introduction to the Theory of Functions 
of a Complex Variable,’ Oxford University Press, New York, 
Nj Yes 1934 

[2] N. J. Fine, “Classes of periodic sequences,” Illinois J. Math., 

_ vol. 2, pp. 285-302; June, 1958. 

[3] E. N. Gilbert and E. F. Moore, “Variable-length binary encod- 

ings,” Bell Syst. Tech. J., vol. 38, pp. 933-967; July, 1959. 

[4] S. W. Golomb, B. Gordon, and L. R. Welch, ‘“Comma-free 

codes,” Canadian J. Math., vol. 10, no. 2, pp. 202-209; 1958. 

[5] 8S. W. Golomb, L. R. Welch, and M. Delbriick, ‘Construction 

and properties of comma-free codes,” Biol. Med. Danske Vid. 

_ Selsk., vol. 23, no. 9, pp. 1-34; 1958. 

[6] John Riordan, ‘‘An Introduction to Combinatorial Analysis,” 
John Wiley and Sons, Inc., New York, N. Y.; 1958. 

[7] E. C. Titchmarsh, “The Theory of Functions,’ 2nd ed., Oxford 
University Press, New York, N. Y.; 1939. 


Analytic Inversion of a Class of Covariance Matrices* 


WILLIAM A. JANOS, MEMBER, IRE 


Summary—The sample covariance matrix arising out of finite 
memory linear least squares estimation over a set of equally spaced 
time points, is inverted by spectral methods (operationally referred 
to as the z transform). It is shown that the complexity of the problem 
depends only upon the complexity of the input correlation function. 
The final solution is shown to reduce to the inversion of a triangular 
system of linear equations of an order less than half the degree 
of the denominator of the input power spectral density function. 


List or Basic SYMBOLS 


These are the different symbols used to denote functions, 
variables, constants, and possibly unconventional math- 
ematical operations used in the presentation. The same 
symbol with appropriate sub or superscripts may be used 
for different purposes. Indices of summation, dummy 
variables and common mathematical symbols have been 
omitted. 


C = with subscripts, coefficient of exponential in auto- 
correlation function; 

D = number of exponentials in autocorrelation func- 
tion; 

M = filter memory, number of intervals 7 long; 

N = degree of factor of numerator of spectral density 
functions; 

P = polynomial of degree not greater than D — 1, 
with prime also; 

Q = with subscripts, primed and bar, polynomial of 


degree D — N — 1; 


* Received by the PGIT, December 6, 1959. 
j Raytheon Co., Wayland, Mass. 


q = with sub and superscripts, coefficients of Q 
(above); 

R = with bar and prime, polynomial of degree less 
than N; 

r = with sub and superscripts, coefficients of R 
(above); 

T = sampling interval; 

v = defined to be zero over memory interval (0, MT); 

V* = z transform of 2; 

W = optimal weighting sequence, star superscript 
denotes its z transform; 

@ = autocorrelation function; 

6* = transform of ¢, spectral density function; 

= factor of numerator or denominator of &* 
depending on N or D subscripts; 

a = with subscripts, decay factor in exponentials of 
¢, without subscript used in the Example at the 
end of this paper to condense notation; 

8 = signal autocorrelation function decay factor used 
in the Example; 

y = noise autocorrelation function decay factor used 
in the Example; 

z = spectral term, in “2” transform; 

x = with subscripts, coefficients of D — 1 degree 
polynomial in 2’; 

WY = arbitrary right side of Wiener Hopf equation; 

y = with subscripts, coefficient of D — 1 degree 
polynomial in 2~*; and 

¢ = with subscripts, root mean signal or noise power. 


478 


The following symbols are defined in context and are 
used to condense notation. 


SSF Qy awh 


INTRODUCTION 


N cases of finite memory least squares estimation of 
[ some linear functional of a discrete, uniformly 
sampled stationary time series, the optimalizing 
condition is expressed by the integral equation [1-3] 
(in the Stieltjes sense); 
M 
dX W(nT)go[(m — n)T] = (mT), meO,M) (1) 
W (nT) is the weighting sequence or filter to be determined; 
¢(nT), the autocorrelation function of the input signal 
plus noise, is assumed known and consisting of a linear 
combination of exponentials [4]; 


D 

_ T\m\ 
Stiga 
k=1 


and ¥(mT’) may be assumed arbitrary. 
The solution of (1) may be obtained by considering a 
related problem. To determine W,(n7') such that 


¢(mT') = Rea, > 0, (2) 


2 W (nT )¢l(m — n)T] = b4.m, both m,ue(O, M), (8) 
then 
W(mT) = . WAnT)¥nT),  me(O,M), (A) 


or W, (nT) is u, nth element of the inverse to the covariance 
matrix. 

The conventional way of solving (1) or (8) has been to 
establish a square system of M equations, one for each 
required time point, and then to invert this system by 
algebraic techniques. In general, this requires the inversion 
of symmetrical matrices of rank equal to the memory MM 
of the filter W. 


Wise [5] and Siddiqui [6] have shown [7] that the ~ 


problem of inverting such covariance matrices of rank MW 
is reducible to the inversion of certain related matrices 
of rank 2D, where 2D is the degree of the denominator 
of the power spectral density function. The former, using 
a semi-infinite matrix representation of translation 
operations instead of a spectral one, obtains a concise and 
elegant formal solution. However, but for a special case, 
the solution requires the inverse of semi-infinite triangular 
matrices. The latter treats the problem in statistical 
language, as a transformation of variables under a multi- 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


variate normal distribution. The symmetry and _per- 
symmetry properties of the covariance matrices are ex- 
ploited to obtain a solution dependent on the inverse of a 
covariance matrix of rank 2D. 

The author independently derived similar conclusions 
about the equivalent problem of inverting (1). The 
required number of linearly independent equations to be 
solved equalled 2D, where the memory J is greater than, 
or equal to, 2D. But here, since (1) is of a time origin 
invariant form, the methods of time invariant harmonic 
analysis were used more extensively to obtain a finite, 
reduced triangular system of linear equations. 

Thus, the purpose of the following investigation is to 
solve (3) by means of the discrete process harmonie 
analytical methods [8, 4], operationally referred to as the 
two-sided z transform [9]. The mode of approach in the 
transform domain is similar to that taken by Youla [10], 
although in the latter’s article, continuous processes are 
dealt with in the solution of an eigenvalue problem. 


PROPERTIES OF SPECTRAL DENsIty FUNCTION 


Since ¢(m7), given by (2), is a combination of decay- 
ing exponentials, its two-sided z transform exists and is: 
summable in closed form. It is given by 


&*(@) = 2) d(mT)z” 
-+o—1-—_., @ 
a (1 = eck oes aay el 


for — a, < log |z| < a, hence for z = 1, or values of 2: 
on the unit circle (where a; is the least of the a,’s). 
Notice that the @,’s are assumed real in (5). If they are: 
complex, then with each term shown on the right of (5)) 
corresponding to an index k, there would have to be: 
added a similar one, but with a, replaced by a, conjugate, 
since the correlation function is real-valued and even.. 
Then the condition for convergence would be — Re: 
a, < log |z| < Re a;. The use of complex exponentials: 
will only add to the complexity of a calculation without; 
contributing anything of greater generality. Hence, only) 
real coefficients will be used in the subsequent investi 
gation. A summarizing discussion will deal with th 
appropriate interpretation of the results for the complex 
case. 
The values that ®*(z) assumes on the unit circle 
z = e'“” are real and positive, and since 


d. 
$ | loz ® * (2) |Z < @, (6)) 


(z|z=1) 


*(z) can be considered the analytic continuation o 
&(e'*") within and without the unit circle. Further, 
from (5), 


e*@=6+(4), % 


| 


thus, &*(z) is factorable [11]. In the particular case of 
: (5) this follows by inspection. 


1960 


| 
} 
| 
| 


® *(@ = e@e(4) 8) 


d 


' where ¢(z) is free of poles and zeros within the unit circle. 
' Hence, the form for the inversion integrals, 


P ole" de, 9) 


Qr1 
(lzl=1) 


is zero for all positive m, and 


cae ies 
271 p ol) e (10) 
(lz|=1) 
is zero for all negative m. 
Properties (6), (7), the positiveness of ®(e*°”), and a 


theorem by Szego, allow the factorization (8). These 
properties and theorem are the discrete equivalent of the 
Paley-Wiener ‘“factorability” criterion for continuous 
_ processes. Thus, from (5) and (8) 


enon) 


1 
eol@en(4) 


where ¢,(z) is a polynomial of degree D, ygy(z) one of 
degree NN = D— P,P > 1 and 


PN) a= (11) 


a) 
=o WP 
e(2) pple) ( ) 
occurs in (8). 
ANALYSIS 


Let (3) be written as 
M 
a P= Wal om’ — mf) 6, ~ (13) 


then 
Ine T= 0; both m, pe (0; iM): 


Notice that outside the interval (0, 17), the first expression 
on the right of (13) “decays” in a manner determined by 
the poles ¢(mT), since the convolved expression may be 
interpreted as the response of a nonrealizable digital filter 
(nonzero for both positive and negative time) to an 
input W(mT’) which is nonzero only over (0, M7). 

It then follows that the z transform of (14) may be 


written as 
I {et 
ne (1 pn(2) 


z 


(14) 


V*(2) is ge Ee (15) 


where P,(1/z) and P/(z) are unknown polynomials of 
their respective arguments, each of degree not greater 
than D — 1 (since gp is of degree D), hence, each of not 
more than D coefficients. 


Janos: Analytic Inversion of a Class of Covariance Matrices 


479 


The straightforward transformation of (13), since it 
holds for all m’, gives us 


Vi@) = Wr@s*@) — <2. (16) 


Thus, we may equate (15) with (16) and derive a more 
explicit form for W*(z): 


az (Zep (4) 
oven) 


eur (Ll ++ Pi@en\ 4) 


on(2)en (4) 


W,(nT) is a physically realizable weighting sequence of 
finite duration or memory. Hence, it is zero outside of 
(0, M), which requires W%*(z) to be a polynomial in 27° 
of degree M7. Thus, the problem is one of obtaining the 
coefficients of P,(1/z) and P’(z), or a particular set of 
coefficients which are linearly and nonsingularly related 
to those of P,(1/z) and P/(z) in order to satisfy the above 
condition. 
If we expand (17) in its Dirichlet series form, 


22 "W,(m) = {5° + > + he 


M+1 


AL ats hem — JT] 


WG) = 9 


=f (17) 


eee 
: ese (im — M — 17 
-ox( Son 
(ek saat | 
+ sas [(m + 1), ee 
| On ie Yow @) 4 


where the large square brackets signify the time sequence 
whose transform is the enclosed expression. 

Thus, for W,(z) to be an M degree polynomial in 2™’, 
it is both neccessary and sufficient that the terms summed 


ot 
nt 


(+ 


be identically zero. As a time sequence, we have 


P,(L)onl Pr@en( 2) 
Soe me (Gn DE el) Soak [((m + 1)T] 
 on( ont) onl owl) 
1 1 
= =| aby lle — pT] 
for mz (0, M), pe (0, M). (19) 


480 


It is then necessary to obtain explicitly the expressions 


ol Vine) ~ el let) 


which satisfy (19). These functions are obtainable as 
linear combinations of known rational functions in z and 
z’. The coefficients of these known functions are then 
to be determined. P,(1/z) and P/(z) are unknown poly- 
nomials of degree not greater than D — 1; thus assume 
they both are of degree D — 1, with possible zero co- 
efficients. Hence, 


and 


roe RO 
Seas eee a 
Pe are A) 
Sr ae 


where Q,, Q/ are of degree D — N — 1 and R,, Ri, of 
degree NV — 1. Similarly, 


gn( ) a 


gn ( ) 


= Q) hae (21) 


4 


for Q and R of degree D — N and N — 1 respectively. 
Thus, (19) becomes 


n() 


1 ee BO: 
a) me) (60 a Ha) ee aeebyy 
Sit | 
+ (a + 22) a(t) + a) ((m + 17] 
ox(3) 
8 1 | 
= -|(d + 22)io(t) + a) [(m — wT} (22) 
ex(4) 4 


for m g (0, M), uw « (0, MW); thus, for m in the intervals 
(M+ 1,M@M+ D-—N) and (— 1, - D+ N), all of the 
terms of the above equations are necessary. This gives 
2D — 2N equations over the 2D — 2N time points. 


In consideration of the fact that 


$ dz eee =n0. 


whnm>M+1+D-—-—NandM< D—WN —1, the 
following modified equations are obtained: 


MF~N, 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 
‘a)m>M+1+D—N, 


nf) 
atl 


(0 + 22.) (om — a 


— 17] 


‘ (1 
+|(o@ + 22) “ ((m + T] 
Ss a 
f = R(z) a(t 
- 5 [Ge + HO.) “(0 (im — wT], 23) 
"\z 
)m<—-D+N-2 
de lass 
a,(t) + zi EO |(m — M = 17) 
Z 
© Aft es 
+|B2 Ja(4) + “) [(m + 0) 
| on} 
= an 
=- a(t) + ) oe [(m— wT]. (24) 
oe elf) 
Let us choose our coefficients so that 
a) (22) is true for m > M + 1 and (25) 


b) (28) is true form < — 1. 


It is clear that (25) a) and b) imply (23) and (24), re- | 
spectively, so there is no loss in generality. Thus, (22) _ 
is then reducible to two independent sets of equations as _ 
follows: 


| (ae) + 


aos (2 ) fe — M — 1)T] 


i | (a fe ae Nit up 


O<uxsM, M+14+DdD-Ne>m>M-4+1. (26) 
This admits a solution of the form 
E Ie ) Jor = = E u(4 ) lo +M+1-—y)T] 

t= 0: 28 


Since both Q,( ) and Q( ) are D — N — 1 degree 
polynomial forms this yields the D — N coefficients of — 


— 1960 


~Q,( +) immediately. Similarly, for -1<m< —D+N, 
_ (22) takes the form 
R 1 


a(t) + i) XE [lm + YT} 
e @ 


i?) 


= |)62)+ Aloe fm — ores) 
on(4) 
_ with the solution analogous to that of Q,, 
| [VO lmT) = @ilm — 1 -— wT} 
n<0. (29) 


Thus, 2(D — N) coefficients have been obtained. Note 
that if N = 0, the problem is solved; this is the condition 
for which Wise’s solution takes a particularly elegant 
form. The case for N + O will now be treated. The 
approach will be to use the already determined 2(D — N) 
coefficients of (27) and (29) in (23) and (24), express 
(23) and (24) in terms of arguments defined over (0, ©) 
and (— 1, ~), respectively, and then to establish sets of 
difference equations over the respective domains by 
transposition of the denominators, gy(1/z) and ¢gy(z). 


Consider the definitions R,, R,- such that 


a(?) a) 


i (p + n)T] = ; 1) p70. (80) 
onl) ov(3) 
and 
R(2) | Ee | 
7 = ; 
E ©) [(p + n)T] = @) (nT); p,n = 0 (31) 
This is justified easily by the following. Let 
ia il N-1 
(4) = 3 (oe (32) 
and 
= if 
Be eee 
py Px 
i)" a ae (33) 
Yn s 
Then it follows that 
fe Hl N-1 
(4) = Sine”, (34) 
where 
N N 
(Dp) = (—)" pa, yep Ll ve +* Ye 
(hy seeks ni, SAK”), (35) 


Janos: Analytic Inversion of a Class of Covariance Matrices 


481 


[, denoting the product over n different subscripts, n 
running from 0 to N — 1. It is also true that 


R’\5,(2) a Ri). 


Thus, if (23) and (24) are to hold form > M + 1 and 
m < — 1, respectively, it is sufficient that the following 
modified systems be satisfied: 


| eot@R,(1) + Rarea()R) + onl Baes-s(4) 


(36) 


+ ov@)Q; But LY loa = 0,2 ee 01 80) 


and 
| taelR,(*) + gol) + oo(*) Ber 


1 1\- 
=e oo(1)0,( aaa Jon = 0 LeeSiVe (38) 
Here the coefficients of R,(1/z) and R/(z), each numbering 
N — 1, must be determined. Now, let us consider the 
Dirichlet series forms, 


ool) = Died, (39) 

ex OQ = ine, (40) 
cel) = Suet oe 
R(t) = Sore”, 42) 
Rie) = Dre, (43) 


and (33) and (36). Eqs. (37) and (38) thus become two 
triangular systems of N — 1 equations each: 


N-1 N-1-n 
OS, Om=nl n,m a5 Sa Pam Ml me ZR 
m=n m=0 
N-1 
= — Demat + 1 — py) Te Xmen 2) 8) 
and 
Melisa 7 N-1 
ps Tost nM SP Dire SIE 3 Om- wre m 
m=0 man 
N=1 
—S femnfn( EL +p) + Vn-nFm(M + 2)} 
in ="0,5 ,N — 1}. (45) 


By a succession of elementary operations (row and 
column multiplication and addition), all but the diagonal 
terms of the matrix of system (44), (45) can be eliminated, 
set to zero. Thus, the necessary and sufficient condition 
for the matrix of (45), (46) to be nonsingular is that all of 
its diagonal terms be nonzero. 


482 


If we start with n = N — 1 and work our way ton = 0, 
both (44) and (45) are easily solved as triangular systems. 
But our solutions will be in the form of pairs. 


Onlin + fyi(M | A ee oii fine (46) 
and 
Fy-(M ae 2)rn = Onl N-1—n = fen—1—n (47) 
T= Os teres Nees 
where f,, and foy_,_, are the triangular solutions. 
Hence, with the f,,, fey—1_,’8 obtained, the coefficients 
of R,(1/z) and R/(z), r, and r% are respectively derivable 
from each pair of simultaneous equations (46) and (47). 


More Exericir REPRESENTATION OF SOLUTION ForM 


Let us make the further definitions— 


N-1 
Ba age (T] (1 — en Pr-B Py ya Be 
N=t 
b) oe = {TT ae OVS re, 
ky kr n=0 
ss D-N ¥ 
90>) Oe, 
0 
Di Nel 
d) Q, (2) = oy 72 
0 
Da 
2) VOQ= d| Qe’ 
0 
D—-N $ a 
f) a QQ, -r) 
r/=1 
D-N J 
g) qo ya ee 
Bb =0, 01, 
- B, Dr 
i) Ce we on l nal go (bet Be ) 
r+D—N=1 he 
)) 1s aa »S Oe sce: 
r=0 
k) is So GQ ap GOs, 
I) Gar a ~ OO. 
D-N 
hy, ay 
my oo = PrPur + BrPur ong (48) 


et is pee 
n) The primed coefficients 4, F’, G4, and H’ have the 
same form as the corresponding ones in j) through 
m), but Q,, p, are replaced by Q/, p’. 
Upon using the definitions in (48) in the explicit form 
of the inverse to the covariance matrix, operationally 
expressed as 


IRE TRANSACTIONS ON INFORMATION THEORY 


September 


(27h. = W,(mT) = Ea SVE 


(2) ; 
+ ee, 
Lom ewe) 
= noel), taal 
en @enl 


for 0 < m, u < M, and 
= 0, otherwise, (49) 


Te tae = b((y Tom mT), 


we obtain a more explicit representation of the solution | 
form: 


D-N 
W,,(mT) ape > CGS tas a beans) to do Onur 
r=1 


DWN N Aye an ; 
oe a en Ce iid T(m—p-rT) ae o =" Sea 
r=0 r/=1 | 
N | 
Ms Ci 
is) 
= (Ez Ona 4 Wakes Once 
y=—-D+N-1 
aa oe Om ane a |B Omati =) 
D-—N 
Pa (Gz On airet tars 5 Gas oan) 


(50) 


where the (+) and (—) labels refer to functions which | 


are nonzero only for positive and negative arguments, — 
respectively. 


CoNCLUSION 


Implicit in the preceding analysis is the condition that 
the memory MM of the filter, expressed as the number of | 
samples of input functions that are operated upon, must 
be equal to or greater than, the number of exponentials 
D in the input correlation function. On this basis, the 
optimal parameters are obtainable in the solution of a 
system of 2D linear equations. If this memory is actually 
less than D, the solution still may be obtained by the 
given method, but the number of equations is equal to the 
memory. This follows by virtue of the actual effective 
number of linearly independent exponentials in the input 
correlation function over the interval (0, 1); thus, if 
M < Dwemay choose M of the exponentials to represent 


960 


he behavior of the input correlation function. Although 
he method described by the preceding work will apply to 
his case, it does not appear to have any advantages over 
he conventional time domain matrix inversion techniques 
ince the number of equations and unknowns is the same 
n both cases. 

If the correlation function has exponentially damped 
eriodic components, its transform will have conjugate 
airs of poles and zeros within and outside of the unit 
ircle. Thus, (22) will keep the same form, but in the 
yartial fraction expansion of the quotient of 


go(2) 
en(2) ’ 


oth f,,, B,, and their conjugates will occur. The form of 
ue analysis, however, will remain the same. 

Let us multiply the transform of the optimal weighting 
equence (50) by 7’, the sampling period set z = e°” and 
hen take the limit as 7 — 0, but restrict the memory 
uration MT to remain constant. The discrete process 
pproaches a continuous one, and the ¢-transformed 
xpression (50) becomes the Laplace transform of a finite 
aemory filter which operates continuously over the 
aterval (0, MT). Thus, the transform of the optimal 
nite memory filter for a continuous input process should 
obtainable as an asymptotic form of the discrete 
ase [12]. 


EXAMPLE 


As an example, let us consider the case where signal 
nd noise are stationary random. Assume that the signal 
nd noise correlation functions are given by 


2 —-B\|nT| 


2- TI 
o ue and’ Caveot * 


(51) 


espectively. 7 is the sampling interval. For no cross 
orrelation, the input autocorrelation function is 


o(nT) = on gre! +o es ere: (52) 
“he z transform of the sequence ¢(n7’), has the form 
Fe an 
rere rere 
os) ov@ox(] 
oes ek 
OS cent) 
rhere 
k= as 
By Bi 4 At” 
ee =e ee + ol — ee (54) 
Bi = (cy + onl —¢ *") — e**"), (55) 
od 
1 : 
poe in (Ft tae i) Wee 


Janos: Analytic Inversion of a Class of Covariance Matrices 


483 


Here, it is assumed that B, > 2A,. This does not restrict 
the subsequent procedure except in eliminating the 
temporary introduction of complex quantities. Thus, 


gn(2) pi s 
on(2) ==, Giz an Qo + (ae eet, ) (57) 
where 
io a eae 
QO; sarees Vi ) 
Oo = Fz (Gatos eee oes ea) 
Bi an ju + erety-2a)r ver 6 eee oe peo (58) 
Here we have D = 2,N = 1,D —N = 1. Thus, Q(1/z) = 
Qo + Q,2 ’ and from (27) and (29), respectively, 
fol) an = [oleae 
— Q: On eae ete = Di 8n,0 Ou,M (59) 
[M@lmL) = (Q@llm — 1 — wT] 
= Q: Oni ea = Q: bn,0 6u,0° (60) 
Upon using (30), or (35), 
#,(4) = 8@ = <""R. 61) 


Since VN — 1 = 0, only the zeroth terms in (39) through 
(45) are necessary. Note that [from (89), (40), and (41), 
respectively], 


go = 1, Lo = Q, 6 = Q: 6u,0- (62) 


Hence, the resulting pair of equations, 


u,M) 


—_ ,-aT(M+2 am —aT(M+1—n) 
Tu.0 2 pe aT ( nh as e a@ B 


=f) p~ a2 (M +2) 
+ piQie . Ou, at 
and 


= ,~aT(M+2) , = ,-~ or (1+p) 
pre s Ty,0 | Tn,0 16 


=f) ,>~@7(M+2) 
se DiWie Z Ou,0 


Thus, the remaining unknowns 7,,o and r/o are easily 
obtained: 


(63) 


feats = {J Ba pie Spree Sen te 


‘ Va (Chea ame ah Ones oa 5,0) 


pox rp Seema a Oger ee baa) 


jee ae {1 as | Ek on 
lanes ce att at Ogsevn Ont) 


a pice a8 Ciecemens Spann (64) 


484 
Hence, from (59) and (60), referring to (48d) and (48e), 


Oe = h: 6,,0 Oy, m and Qi. = eae’ 6,,0 Oyn,05 (65) 
and the four coefficients Q, 0, Qo, Tu,0, 7%,0 have been 
obtained. Since the degree of gp is 2, the necessary and 
sufficient number of coefficients is also 2D = 

On referring to definitions (48), the u, mth element of 
the inverse matrix takes the form 


W,(mT) = (Qo =f Qi) On—p,0 
= DONO: Cae, = fs Oneare) 


as Qipie(+)7 re + Cela ek 


Pr2 
] _ ox uy 


—aT\|m—pz\ 


+ QoQs.0 Sm—ar—14r + QoQi.0 Smtt,r 

+ Q:0:.6 Smeuya + QQ; Omer 

see inna a iO ants ae 
tenet  Guel FAH 2) 
+ r4.o(Que(—)°7""? + Gye?) 


24 Dir, 0 + tig, el pee M- ae 


1 os Beet 


(66) 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


BIBLIOGRAPHY 


[1] A. C. Aitken, “On least square and linear combination of pr 
diction,”’ Proc. Roy. Soc. (Edinburgh), vol. 55, pp. 42-4 
November, 1934. 

[2] A. B. Lees, ‘Interpolation and extrapolation of sampled-data, 
IRE Trans. oN INFORMATION Turory, vol. IT-2, pp. 12-1 
March, 1956. 

[3] M. Blum, ‘An extension of the minimum mean square pr 
diction theory for sampled input signals,” IRE Trans. o 
InrorMATION THEORY, vol. IT-2, pp. 176-184; Septembe 
1956. 

[4] N. Wiener, “Extrapolation, Interpolation and Smoothing ¢ 
Stationary Time Series,’’ John Wiley and Sons, Inc., New York 
N. Y.; 1949. 

[5] J. Wise, “The autocorrelation function and the spectral densit; 

function,” Biometrika, vol. 42, pp. 151-159; 1955. 

[6] M. M. Siddiqui, “The inversion of the ” sample covariane 

matrix in a stationary autoregressive process,’ Annals of Math 

Statistics, vol. 129, pp. 585-588; June, 1958. 

[7] The author is indebted to M. Blum for this realization. 

[8] H. Wold, ‘‘A Study in the Analysis of Stationary Time Series,’ 

Almquist and Wiksell, Stockholm, Sweden; 1938, 1954. 

(9] G. Franklin, “Linear filtering of sampled-data,’’ 1955 IRI 
ConvENTION ReEcorpD, pt. 4; pp. 119-128. 

[10] D. C. Youla, “The solution of a homogeneous Wiener-Hop 

integral equation occurring in the expansion of second-orde 

stationary random functions,’ IRE Trans. on INFORMATION 

Tueory, vol. IT-3, pp. 187-193; September, 1957. 

[11] G. Szego, “On the boundary value of an analytie function,’ 
Math, Ann., vol. 84, pp. 232-244; 1921. 

{12] That the optimal continuous filter is attainable as an asymp- 
totic form of the optimal digital filter has been proved in th 
time domain in the paper by P. Swerling, “Optimum linea 
estimation for random processes as the limit of a 


based on sampled data,” 1958 IRE WESCON Conventio 
pt. 4; pp. 158-163. 

[13] L. A. Zadeh and J. R. Ragazzini, ‘‘An extension of Wiener’! 
way Aa prediction,” J. Appl. Phys., vol. 21, pp. 645-655) 
u 

[14] W. Be Janos, ‘Optimal Filtering of Periodic Pulse-Modulated 
Time Series,’ ’ Ph.D. dissertation, University of California! 
Berkeley, Calif. ; 1958. 

[15] R. Mittra, “On the Solution of a Class of \Wiener-Hopf Integra 
Equations i in Finite and Infinite Ranges,’’ Antenna Lab., Uni. 
versity of Illinois, Urbana, Ill., Tech. Rept. No. 37; 1959. 


IRE TRANSACTIONS ON INFORMATION THEORY 


An Isospectral Family of Random Processes* 


RICHARD A. SILVERMAN}, sENIOR MEMBER, IRE 


Summary—We construct a family of random step functions 
Cn(t)} whose members all have the same power spectrum and 
ich that as n — ~, x,(t) converges to x..(t), the Gaussian process 
ith the same spectrum. We illustrate the procedure for calculating 
le general multivariate distribution of the processes {x,(f)} by 
ilculating the univariate, bivariate and trivariate distributions. 
fe show how a suitably constructed univariate entropy can serve 
$ an index of the extent to which x,t) has approached the Gaussian 
mit x(t). 


INTRODUCTION 


Ext) =0, Eax(t) x(t + 7)= CC), 


‘here E is the expectation operator. A considerable part 
f noise theory is devoted to study of the case where x(¢) 
; Gaussian. In this case, the law of x(t), 7.e., the prob- 
pility that r(t,) <¢,,1<n<N,1<N < o, where 
he ¢, are arbitrary real numbers, is the familiar N-variate 
raussian distribution whose moment matrix can be 
xpressed simply in terms of C(r).* Thus, in the Gaussian 
ase, C(r) uniquely specifies x(t), which is, of course, the 
articular beauty of this case. More generally, C(r) gives 
nly a more or less incomplete characterization of x(t). 
The object of the present paper is to describe and study 
n infinite family {z,(t)}, 1 <n < ©, of non-Gaussian 
andom processes, which converge to a Gaussian process 
o(t) asn — ©, and whose laws are calculable, at least in 
rinciple. The family {z,(t)} will be constructed in such 
way that all its members have the same correlation 
mction C(r), or equivalently, the same power spectrum 
1 


Pw) = Al exp (—tw7)C(7) dr. 
75 oe 


ONSIDER a zero-mean stationary random process 
x(¢) with correlation function C(r), 7.e., a process 
x(t) for which 


riefly, we shall say that {x,(t)} is an zsospectrai family. 
ince as n — ©, the general appearance of the sample 
unctions of 2,(t) changes and approaches that of the 
‘aussian process %..(t) with the same power spectrum 
yx correlation function), the general inadequacy of 
(w) or C(r) as a means of characterizing random pro- 


* Received by the PGIT, January 11, 1960. The research re- 
orted in this article has been sponsored by the Air Force Cam- 
ridge Research Center, Air Research and Development Command, 
nder Contract No. AF 19(604)5238. ; 

+ Institute of Mathematical Sciences, New York University, 
) Waverly Place, New York, N. Y. : : 

18. O. Rice, “Mathematical Analysis of Random Noise,’ re- 
rinted in the collection ‘(Selected Papers on Noise and Stochastic 
rocesses,’” N. Wax, Ed., Dover Publications, Inc., New York, 
fe y., pp. 181-183; 1954. 


cesses and the general need for higher-order statistics are 
put into rather strong focus. 

The method we use to construct 2,(¢) from an under- 
lying shot noise (Poisson process) is a familiar one. The 
novelty of our treatment consists in observing that when 
the shots are rectangular the members of {z,(t)} can be 
described in theoretically complete detail. 


CONSTRUCTION OF {z,(t)} 


Let the sequence t; (7 = --- , — 1, 0, 1, ---) be the 
occurrence times of the shots in a stationary shot noise 
(Poisson process) with an average rate of p shots per 
second. More specifically, we have the following five 
properties (among others): 


1) The occurrence of m shots in the interval I and the 
occurrence of n shots in the interval I’ (m,n = 0, 1, 2, ---) 
are independent events if J and I’ do not overlap. 

2) The probability of one shot in an infinitesimal 
interval of length At is pAt + o(At), whereas the prob- 
ability of more than one shot is o(At). 

3) The probability of m shots in any interval of length 
T is given by the Poisson distribution 


™m 


p(m;) =e, m=O aly 2 se, 


where \ = p/’, the parameter of the Poisson distribution, 
is the common value of the mean and variance of a random 
variable which takes the values m = 0, 1, 2, --- with 
probabilities p(m; 2). 

4) Given that m shots have occurred in an interval of 
length 7, their occurrence times t,, » tn (without 
regard to order of appearance) are independent, identi- 
cally distributed random variables with the uniform 
probability density 1/7. 

5) The probability density of the interval between 
successive shots is p exp (— opt). 


For discussion and derivation of these properties we 
refer the reader elsewhere.” * 

Now let h(t; a) be a step function of unit height and 
width a, 7.¢., 


A(t; a) = 1, 
h(t; a) = 0, Mee ee Ue 


Ob sation (1) 


OKs EDU 


I 


2 W. Feller, ‘‘An Introduction to Probability Theory and its 
Applications,’’ 2nd ed., John Wiley and Sons, Inc., New York, N. Y., 
pp. 146, 400; 1957. 

3 Rice, op. cit., see pt. I. 

4A. Blanc-Lapierre and R. Fortet, ‘Théorie des Fonctions 
Aléatoires,’’ Masson, Paris, ch. 5; 1953. 


486 


To construct the isospectral family {z,(t)}, we write 


a(t) = (1/-Vn) si h(t — #30) — Vne, 


j=-—@ 


1 ES (ee eo (2) 


where the ¢‘” are random times belonging to a shot noise 
with an average rate of n shots per second. It is clear that 
each 2z,(t) is a stationary random step function. To 
calculate expectations of lagged products of 2,(t) we 
follow the usual procedure: First we use 4) to calculate 
conditional expectations for a long finite interval of 
length 7 containing exactly N shots and then we average 
over N using 3). Again, we refer to the literature for 
details.’ In particular, we find 


Ex,(t) = Ex,(2(t + 7) = C(r), 


where C(r) is given by 


LS, 


om = [ ; He okra sa) dt 


eS fat tet 6 (3) 


omit ash et 


C(a)= "0, or 1 


The corresponding common power spectrum of the iso- 
spectral family {z,(t)} is 


Dee aC GAlpcn mar) aero: 


-o 


Pw) = on 


The constant a is still at our disposal. In order to make 
the members of {x,(¢)} corresponding to small values of n 
drastically non-Gaussian, we can choose a < 1, so that 
there is negligible overlap of the shots making up 2,(t) 
if nm <1/a. Then for n ~ 1/a overlap of the shots begins 
to be appreciable and as n increases further, the process 
z,(t) approaches the Gaussian process ~..(¢) with the same 
power spectrum.° 


TYPICAL SAMPLE FUNCTIONS oF {2,(t)} 


To construct typical sample functions of x,(¢), we first 
use random number tables’ to find sample values of a 
random variable y which is uniformly distributed in the 
unit interval (0, 1). We then observe that the new random 
variable 


(4) 


1 = : log 
p 
has the probability density p exp (— pt) of the interval 


5 Rice, op. cit., see pts. I, II. 
uv Specifically, ’the type. of convergence we have in mind is con- 
vergence in distribution, 7.e., the law of z,(¢) converges to that of 


Leo(t). 
7The RAND Corporation, “A Million Random Digits with 
100,000 Normal Deviates,’’ Free Press, Glencoe, Ill.; 1955. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septemb 


between successive shots in a Poisson process with avera; 


rate p. To see this, note that 
Prob [F-*(y) < é] = Prob [y < F()] = F@), 


so that F~*(y) has the distribution function F(é). If F( 
is to be the distribution function corresponding to tl 
probability density which is p exp (— pt) fort = O ar 
zero otherwise, then F(t) = 1 — exp (— pt), whence (¢ 
follows at once. Moreover, if y is uniformly distributed : 
(0, 1) so is 1 — y. Thus, sample values of the rando 
variable. 


1 
A log 7 (. 


generate time markers locating the shots in a Poissc 
process with an average rate of p shots per second. Speci! 
cally, if 1, °** » Mm are m sample values of (5), we g 
m random times {{”, , {& by writing 


) n 
ie emi Race ale a etic 


a = Ani ating te ook 


Finally, with these values of the random times, we u: 
(2) to construct sample functions of z,(¢). 

Figs. 1-3 show the results of this procedure for tk 
cases of low, medium and high density shot noise, r 


spectively. In each case a was chosen to be 0.25. Fig. 
shows a four-second sample of the process 


= Nal» 


ns 


a(t) = i h(t — t!?; 0.25) — 0.25, 


j=--@ 


Fig. 2 shows a four-second sample of the process 


yb ie 


j=-0 


x(t) =} t; 0.25) — 0.50, 


and Fig. 3 shows a one-second sample (drawn on a differe: 
scale) of the process 


1 > AG — 


j=-@ 


tio(t) = t;'”; 0.25) — 1.00. 

The circle in Fig. 2 shows a level where the sojourn. 
x,(t) was so brief that it could not be indicated on tl 
scale of Fig. 2. A similar remark applies to the thr 
circles appearing in Fig. 3. In each case, the sample 4 : 
tions were given the initial value of — wh NA, U.€., | 
for z(t), — 0.50 for a,4(t) and — 1.00 for Pa: wi 
starting the sample functions of 2,(¢) and 2,¢(é), atypic 
portions lasting about a = 0.25 seconds occur; due 
the fact that there are no shots before t = 0.° (Th 
retically, each 2,(¢) should begin in the infinitely rem 
past.) To eliminate this transient behavior, we suppress 
the first second of the record of 2,(¢) and the first (¢ 
second of the record of 2,¢(t). 


8 This effect is unimportant for «,(¢), since in thi 
lap of shots is slight. 2 ite pete 7 
| 


960 


x,(t) 


: 1.0 2.0 3.0 4.0 
| t, sec. 


Fig. 1—Sample function of zx,(t). 
Xie(t) 


1.0 


-1.0 


0.20 0.40 


Silverman: An Isospectral Family of Random Processes 


487 


x(t) 


2.0 


1.0 2.0 3.0 40 
t, sec. 


Fig. 2—Sample function of z(t). 


0.6 0 0.80 1.00 


t, sec. 


Fig. 3—Sample function of x1¢(t). 


PROBABILITY DISTRIBUTIONS OF {2,(t)} 


The family {z,(¢)} has the desirable feature that we 
san calculate the law of each x,(t), 2.¢., all the multivariate 
probability distributions of x,(¢), although the amount of 
work required to calculate high-order probability dis- 
tributions is considerable. We shall illustrate the general 
procedure by deriving formulas for the univariate, 
bivariate and trivariate probability distributions of 2,,(d). 


A. Univariate Distribution 


By (1) and (2), the range of possible values of the 
random variable z,(t), ¢ fixed,’ is the lattice 


Ymin = m/Vn i. Vn, 


The probability that x, = Ym» 18 just the probability 
that in the a seconds preceding ¢ precisely m shots occur. 
Consequently, denoting the probability that % = Ymin 
py P,,(m), we have 


m=0,1,2,-°:. 


(ne)” one 


P,(m) = plmjna) = (6) 


9 The value of ¢ is irrelevant since 2,(t) is stationary, so that we 
can drop the argument ?. 


where p(m; na) is the Poisson distribution with parameter 
na. Elementary calculations verify that 


oe 


> Ym nb n(M) Se 0, De Ym nb n(M) = a, 1 < n S or 
m=0 


m=0 
as required. Note that P,(m) can be regarded as the sum 
of n independent, identically distributed Poisson random 
variables with parameter a. Thus, the convergence of 
x,(t) to a zero-mean Gaussian random variable with 
variance a is a particularly simple case of the central 
limit theorem.*” Note also that x,(é) can take arbitrarily 
large positive values, but no negative values less than 
— +/na; this asymmetry of the sample functions dis- 
appears asn > o, 


B. Bivariate Distribution 


The probability that 2,(t) = ye,, while 2,(¢ + 7) = 
Ym n Will be denoted by P,(¢, m; 7) and is independent of 
t, since x,(t) is stationary. The event that precisely m 
shots occur in the interval (wu, v) will be denoted by 
E,,(u, v). Then, clearly, P,(¢, m; 7) is the joint probability 


10 The fact that the Poisson distribution p(m; \) approximates 
the Gaussian distribution for large values of \ is noted by Feller, 
op. cit., see p. 176. 


488 


of the events HL, (—a, 0) and LE, (r — a, 7). If |r| > a, 
these two events are independent and we have 


net)‘ (net) ona 
Pall, mj 1) = PAOPy(on) = Ca BAR g-me, 
On the other hand, if r = 0, we obviously have 
Pf, m; 0) = Pil) Sens 


where 6, is the Kronecker delta. The interesting case is 
0 < |r| < a, for then FE, (— a, 0) and EL, (r — a, r) are 
not independent events, since the intervals (— a, 0) and 
(r — a, 7) overlap. In this case, we have 


P,(£, m; 7) = >. Prob {E,_,(—a, 7 — a), 
E(r — a, 0), En-.(0, 7)} 
if 0 <.c <a, and 
P(t, m; 7) = >> Prob {E,-,(r — a, —a), 
E,(—a, 7), Ey_.(7, 0)} 


if — a <7 < 0; the summations are over all values of the 
nonnegative integer s compatible with the given values 
of £ and m. In either case, we find 


me (n |r Dla — | Dm | 


(€ — s)!s!(m — s)! 


PA iar ie 


s 


On rh <a, (7) 


Eq. (7) is the explicit form of the bivariate Poisson dis- 
tribution, defined by Feller’* in terms of its generating 
function 


-exp (—n | + | — na), 


o o 


E > Do sis%P,(€, m; 7) = exp n[siso(@ — 7) 


£=0 m=0 


Or tc, (8) 


The generating function (8) can either be calculated 
directly (as pointed out by a referee) or by making the 
change of variables s, = e™“, 8s. = e*” in the expression for 
the two-dimensional characteristic function 


E exp [tuz,(t) + wa,(t + 7)] 


= exp in / 


= exp {n[r(e"’ — 1) + (a— ne™**’ — 1) + re — ID} 
given by Rice.” (In writing the last expression, we 
have simplified (2) by setting the normalization factor 
1/7 n = 1 and dropping the centering constant 
— ~/na. Then, P,(¢, m; 7) means the joint probability 
that z,(t) = ¢, x,(¢ + 7) = m instead of the joint prob- 
ability that 2,(¢) = yz, a(t + 7) = Yn.) 

Since C(r), as given by (3), must be the correlation 
function of every 2,(t), it follows that 


art ee — Fa, 


Grate eee ae ot rib a 


1 Feller, op. cit., see p. 261, 
2 Rice, op. cit., see p. 245. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


‘ 


> Yintmmeat, Mm: 1). = C@); Ieee oe (9 
L,m=0 


For t = 0 or |r| > a, it is trivial to verify (9) directly 
but for 0 < |r| < a, (9) becomes highly nontransparent 
A typical identity derivable from (9) is 

aol 
(€ —s)!s!(m—s)! 


min(¢,m) 


= 3 exp (3/2). 


Oe = 


C. Trivariate Distribution 
It is clear by now how the construction proceeds it 
general. Therefore, we calculate only the probability 
P,(k, €, m; 7, 7’) of the joint event that z,(4) = Yam 
tr(é + 7) = Yen and z(t + 7 +7’) = Ym m, assuming tha 
0<7r+ 7’ < a. In this case, we have 
P,(k, £, m; 7, 7’) = Prob {E,(—a, 0), E.(7 — a, 7), 
E,(z + 7’ —a,7+7')} 
= ye Prob {F,_,-,(—a, 7 — a), 
E(t —a,7+ 7’ — a), Er + 7’ — a, 0) 
yr ALS 1); Beer ch tt a 7’)}, 


where we have gone over to nonoverlapping events, and 
the summation is over all values of the nonnegatiy) 
integers r and s, compatible with the given values of hk 
and m. Specifically, we find 


min(k,é,m) min(k—s,¢—s) (nt)* 7" 
Pik Gomer, a rT Eee 
ae My i s=0 r=—min(m-—£,0) (k ae a s)! 
(nz’)” [nla — 7 — 7’])’ (nr)o"* (i) oe 


ene ae 


¢—r—s!\(m—t£+H 


r! s! 


-exp (—nt — nr’ — na). 


For the three-dimensional characteristic function, 
have (with the same definition of x,(¢) as in the twa 
dimensional case) 


E exp [tux,(t) + wa,(t + 7) + twa,(t + 7 + 7’)j | 


2 | 

. h : . h : - Un | 

= exp {n / [e* (tya)tivh(t+rya)+iwh(t+r+r’ sa) 1] d 
J —O 


exp {n[r’(e'” — 1) + r(e"**” — i) 
+ (a a ad Cena — 1D) 
+ ret — 1) + re — WN ese 


As before, the generating function can be obtained fro 
the characteristic function by a simple change of variable 
and is found to be” 


| 


fos} 


Dd sisis3P,(k, €, m3 7, 7’) = exp {n[siS83(a — 7 — 
k,f,m=0 


+ 8,827’ + 8837 + 87 + S37’ — 7 — 7! — a]| 


18 Inclusion of these remarks concerning characteristic functio: 


and generating functions was suggested by S. O. Rice. 


960 
| The examples just given make it clear how to calculate 
’, (k, €, m; 7, 7’) for other ranges of the parameters 7, 7’ 
nd how to calculate the general N-variate distribution 
feki, --- , ky; 1, --:., Ty-1). Of course, for large N, 
xplicit calculation of the N-variate distribution is formi- 
able and would require the services of a high-speed 
lectronic computer. 


ENnrropy oF {2,(¢)} 


|The fact that asn — ~, x,(¢) converges in distribution to 
ie Gaussian process 2.(f) with the same power spectrum 
allows from the work of Fortet.’* An index of the closeness 
f x,(t) to x.(t) 1s furnished by the entropy of z,(¢); for 
implicity, we consider only the univariate entropy of 
.(t). Since the entropy of the Gaussian random variable 
. With mean zero and variance a, as defined in the usual 
yay, is 


= log V/ Qrea : 


ve must insist that the univariate entropy of x, converge 
o H.(a) as n — ~."° Ag we have seen, the univariate 
listribution of x, 1s given by 


mn) = Prob {z, = Yn.n= mi Vn — V na} 
(na)” yma 


= p(m;na) = i 
Thus, at first it might seem that the appropriate entropy 
© consider is the entropy 


H,(a) = — ¥ p(m;na) log p(m3na) (10 


f the discrete random variable z,. However, the limit as 
,— o of H,(a), as defined by (10), is independent of a, 
ince a is involved only in the combination na, and there- 
ore, H,,(a) cannot converge to H.(a) = log V 2rea. In 
act, H,(a) is actually logarithmically divergent, as the fol- 
owing simple qualitative argument shows: As already 
oted, as n — ©, the random variable x, converges in 
istribution to the Gaussian random variable x. with 
yean zero and variance a. It follows that for large n 
10st of the distribution of x, is concentrated in the 


“4 R. Fortet, “Random functions from a Poisson process,”’ Proc. 
f the Second Berkeley Symp. on Math. Stat. and Prob. Theory, 
_Neyman, Ed., University of California Press, Berkeley and Los 
ngeles, Calif. pp. 375-385; 1951. é 

1% ©, BE. Shannon, “The Mathematical Theory of Communi- 
ution,” reprinted in the book of the same title by C. E. Shannon 
ond W. Weaver, University of Illinois Press, Urbana, p. 54; 1949. 
hannon’s discussion of the difference between the entropy of a 
iscrete random variable and that of a continuous random variable 
highly relevant to the present analysis. 

16 Since z,(t) and 2.o(é) are stationary, we need not retain the 
‘gument ¢ in discussing univariate quantities. 


Silverman: An Isospectral Family of Random Processes 


489 


interval (— +/a, + Va), 1.e., within one standard devia- 
tion of zero, and that the values of p(m; na) are approxi- 
mately equal in this interval. Since the possible values 
of zx, are a distance 1/ /n apart, there are approximately 
2Vna values of p(m; na), all approximately equal to 
1/2\/na. The resulting estimate of the entropy H,(a) 
of the discrete distribution p(m; na) is 


—2-V ne - = log 2V/na, 


oe 
»V na 


1 
2V na 
which diverges logarithmically asn > ~. 

The appearance of this difficulty is not surprising since 
the entropy of a continuous random variable is defined 
as — f_2 p(é) log p(é) dé, where p(£) is the probability 
density of the random variable, whereas the discrete 
random variable x, used to define H(«) has no probability 
density, in spite of the fact that x, converges in distri- 
bution to the continuous random variable x... What has 
to be done is to replace the random variable x, by a 
related continuous random variable x* which converges 
to a. in density as well as in distribution.*’ To achieve 
this, we define x* as the continuous random variable with 
the probability density 


pal) = 2 Vn pls na) — Ymns 1/V), AN) 
where h is the step function defined by (1), and y,,,, = 
m//n — ~/na as usual. In other words, we replace the 
discrete random variable z, by the continuous random 
variable «*, where the “‘mass’”’ formerly concentrated at 
the point Ym, is now uniformly distributed over the 
interval (Ym ny Ym n+i)} Of course, this requires adjustment 
of the step height, which accounts for the factor Wn in 
(11). Using a local central limit theorem,’* we see that 2* 
converges in density to the Gaussian random variable 
Loy aSn — ©, The entropy of x* is defined in the usual 
way as 


H¥(a) = — | pal8) log pal) ab. 
Substituting for p,(é) from (11), we find that 
H*(a) = — D> p(m; na) log p(m; na) — log Vn, 


so that the logarithmic divergence of H,,(a) is cancelled out 
and the dependence of H..(a) on @ is restored. We should 


17 As usual, we say that the random variable x, converges in 
distribution as n — © to the random variable x. if Fn(é), the 
distribution function of z,, converges to F’..(£), the distribution 
function of x, at-every continuity point of the latter. If a, and 
Zo have probability densities p,(é) = (d/dé)F.(E) and po(é) = 
(d/dé)F..(€), respectively, we say that x, converges in density to 
Lo if pr(—&) > pol) almost everywhere. (Other definitions are 
possible.) Convergence in density implies convergence in distribu- 
tion, but not conversely. 

18 B. V. Gnedenko and A. N. Kolmogorov, ‘Limit Distributions 
for Sums of Independent Random Variables,” translated by K. L. 
Chung, Addison-Wesley Publishing Co., Cambridge, Mass., p. 233; 
1954. 


490 


now have H*(a) > H.(a) = log V2rea asn > ~. That 
this is the case is shown in Table I, where we give the results 
of numerical calculations of H*(a) for the value a = 0.25 
used in constructing Figs. 1-3. (All logarithms are to the 
base e.) We see that as n increases, H* (0.25) rapidly 
approaches the limiting value H..(0.25) = log Wme/2.”* 
The univariate entropies of the processes 2,(t), v4(t) and 
X,,(t) represented in Figs. 1-3 are all rather close to the 
limiting value of 0.726, despite the fact that these cases 
correspond to low, medium and high density shot noise, 
respectively. However, the univariate entropy still seems 
to be a useful supplement to the power spectrum as an 
index of the structure of non-Gaussian random processes. 
(It will be recalled that all information about the relative 
phases of the harmonic components of a process are sup- 
pressed in its power spectrum, which limits the utility of 
the power spectrum as a means of characterizing non- 
Gaussian processes.) 


19 Note, however, the initial decrease of H,,*(0.25). 


On a Characterization of Processes for which Optimal 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


TABLE [0 
n H,*(0.25) 
1 0.617 
2 0.581 
4 0.612 
8 0.664 
16 0.700 
32 0.714 
64 ON7Z1 
400? 0.722 
co 0.726 
ACKNOWLEDGMENT 


The author wishes to thank P. Elias for his very helpfu 
remarks concerning the entropy of {z,(t)}. 


20 The numerical work was done by Miss P. A. Smith, using 
E. C. Molina’s tables, ‘“‘Poisson’s Exponential Binomial Limit,’ 
D. Van Nostrand Co., Inc., New York, N. Y.; 1942. 

21The value 400 corresponds to the largest value of p(m; na 
available in Molina’s tables. 


Mean-Square Systems are of Specified Form’ 


A. V. BALAKRISHNAN, MEMBER, IRE 


Summary—This paper presents the first results in an uncon- 
ventional approach to the problem of mean-square optimization. 
Instead of obtaining a representation for the optimal operator for 
a process, we seek to characterize the class of processes for which 
the optimal operator is of specified form. If the processes are given, 
so that the multivariate characteristic functions are known, then 
our results can be used to tell whether it is possible for the 
optimal operator to have a specified form. The bulk of the paper 
pertains to the signal extraction problem where the signal and 
noise are independent and additive, and it is desired to estimate 
some function of the signal. Here, with a slight shift in viewpoint, 
we phrase the characterization problem in the following way: 
Given, for example, a noise process, determine the class of signal 
processes for which the optimal extraction system is of specified 
form. The case where the noise process is Gaussian comes in for 
special attention. 


I. INTRODUCTION 


HE central problem in any mean-square optimi- 
lee can be stated as follows: A noisy signal Y(¢), 
for t belonging to a parameter set 7, where 7 may 
be discrete or continuous, is observed and it is desired to 


* Received by the PGIT, November 11, 1958; revised manu- 
script received, November 25, 1959. 
+ Space Technology Labs., Los Angeles, Calif. 


obtain the best estimate (in the mean-square sense) of 
the true signal X(¢) or some function of it, for example 
QLX (to)]. Now, it is well-known that the optimal estimato: 
is given by the conditional expectation 


E{Q{X(t)] | Y@, te []}. | 


This represents in general a nonlinear operation on Y(é)| 
and the problem has largely been studied under the r 
striction that the operations be linear. The reasons f 
this are many, not the least of which is the difficulty ii 
determining the optimal nonlinear operator in usabl 
form. Series expansions in canonical functions of som 
sort—polynomials for the most part—would appear to b 
unavoidable in order to represent the optimal operator ii 
the general case [1-3]. As a possible alternate approach t 
this problem, we reverse the point of view and seek t 
characterize the class of processes for which a given system 
or class of systems is optimal." For example, we maj 
characterize the class of processes for which the optima 
system is specified in terms of a finite number of th 


_? Our interest in the problem was rekindled during a ee | 
Nia L. A. Zadeh. It is a pleasure to acknowledge our indebtedne 
o him. 


960 


anonical functions chosen for representation. (In this 
aper we consider only polynomials.) An advantage of 
he approach, apart from the intrinsic interest, is that the 
haracterization problem turns out to be a linear one in 
erms of the characteristic functions of the distributions 
avolved. 

Given a random process, what shall we mean by a 
haracterization of the process? If the parameter set is a 
nite set of discrete points, then, of course, the determi- 
ation of the multivariate density or distribution will 
haracterize the process completely. The characteristic 
anction, being the Fourier transform of the distribution, 
vill also be equally sufficient, since it determines the 
listribution uniquely. The basic relations derived in this 
yaper are in terms of the characteristic functions. When 
he parameter set is infinite, as in the case of a discrete 
x continuous parameter stochastic process, although the 
-haracterization is, in principle, still possible in terms of an 
‘aumeration of all the multivariate joint distributions, in 
wractice we have to make simplifying hypotheses allowing 
‘eduction in the variables to be specified. For example, 
or a Gaussian process the mean and correlation functions 
wre sufficient. Alternately, we may assume that the process 
s Markoffian, and so on. 

The characteristic functions, to be sure, lack physical 
ignificance and are not usually experimentally measured. 
Jn the other hand, the use of characteristic functions is 
mly an intermediate step in the problem, an analytical 
ool rather an end in itself. 

One somewhat unexpected result of our theory is that 
under certain conditions there will only be one process 
or which a given system is optimal. This opens the 
yossibility, for instance, of distinguishing between signals 
yased on the system for which it is optimal. 

We begin in Section II with a set of necessary and 
ufficient conditions for multidimensional distributions 
n order that the optimal mean-square estimator be of 
pecified form. (We obtain this in terms of the character- 
stic functions which do not appear to be quite amenable 
o physical interpretations. Indeed, if physical intuition 
s inadequate in the “forward” problem, it is even more 
o in the “backward” problem considered here.) This 
ffords a means of checking whether or not a given process 
an lead to an optimal system of given form and has 
tential applications to all prediction and _ filtering 
roblems. In this paper, however, we confine our attention 
o the signal extraction problem where the signal and 
Oise processes are additive and it is desired to obtain the 
ptimal mean-square estimate of some function of the 
ignal. Here we seek to characterize the class of signal 
rocesses for a given noise process and a given optimal 
xtraction filter. These process applications are given in 
ection III. We consider the zero-memory filters in some 
etail since they are not without importance in themselves 
nd indicate, moreover, the type of methods applicable 
nd the type of results to be expected in general. Our dis- 
ussion of the general case is confined to the optimal 
Iters which are linear. 


Balakrishnan: Optimal Mean-Square Systems 


491 


II. CHARACTERIZATION OF MULTIVARIATE DISTRIBUTIONS 
For OprimMaL MEAN-SQUARE 
ESTIMATOR OF PRESCRIBED ForRM 


We begin by examining the conditions under which a 
multivariate distribution leads to an optimal estimator 
of prescribed form. By focusing attention on the corre- 
sponding characteristic functions, we obtain a set of 
usable necessary and sufficient conditions. We consider a 
finite number of (real) random variables 2%, Yo, Yi, °° * 5 Yn 
where 2 is to be estimated in terms of Yo, 1, °** ; Yn. AS 
is well-known (see [5], for example), the optimal mean- 
square estimate is then, of course, given by (the con- 
ditional expectation) 


E[xo | Yow ster Yn. (1) 


In general, this is a Borel measurable function (in n +1 
real variables). Let the characteristic function of the n + 1 
variables yo, --* , Yn, be denoted Cy (to, --- t,), so that 


Cyto, +++ ty) = B| exp >, in | 
0 


Then C,(---) isa uniformly continuous function. Although 
for each particular result conditions can be relaxed, we 
shall now assume that all given random variables have 
finite moments of all orders. Then C,(---) has derivatives 
of all orders and these are all again uniformly continuous. 
Let D, denote the differential operator 0/d(7t,), so that 


D,Cy(to, Piet, C,(to, oie» ti)s 


(e) 
5) SAaGEy 


(If we wish to use the formalism of the theory of linear 
operators on Banach spaces, we can take the space to be 
that of uniformly continuous functions on the Euclidean 
space H,,, under the uniform norm, so that each D, is a 
linear operator with dense domain.) In any case, any 
polynomial P(Do, --- D,) is then also a well-defined linear 
operator, where P(---) is any polynomial in (n + 1) 
variates. The class of operator functions can be enhanced 
by taking limits of sequences of polynomial operators. 
This is as far as we shall go, since, for our purposes, the 
polynomials are the only ones on which we can have a 
direct and independent hold. 

We begin with a theorem which is basic in our work. 
It is an extension of a result known in the special case 
where the estimator is linear, and the signal function 
estimated is also linear. (See [4], for example, where 
further reference may be found.) 

Theorem 1: Let P(---) be a polynomial in (n + 1) 
variates (or, more generally, an entire function). Then, a 
necessary and sufficient condition for” 


Elzo | Chis SE 2 Yn] a P(Yo; ig Yn) (2) 


2 All equalities of random variables are understood to be with 
probability one. 


492 
is that 
aye 


(180) aulSor tots" tn) (3) 


so=0 


= SACOG, ries D,)C,(to, oe tn) 


where C,,,, (So, to, «++ , tn) is the characteristic function of 


Toy Yo: Yi» °° * Yne 
Proof: We use a characteristic property of conditional 
expectations. (See [5] for example, p. 22). This is to the 
effect that if g(---) is any Borel measurable function in 
(m + 1) variables and 
El xog(yo) +++ yn) |] < @, 


then 
Efxtg(yos =" Yn)] 
= E{g(yo, +--+ yE[xo | glyo, --: yi}. (4a) 


Specializing this to the case where 


(Yo *** Yn) = expt 2 te 


we have 


Efxt exp t >S t,t, 
= BX(esp i y tate JB (Wo, 06e vip (4b) 


Since kth moments are finite, we can differentiate 
inside the expected value integral on the left in (4b), so 
that 


Bat exp 7 De tan | 
0 


= at (ie he Sec sy on (5) 

Next, let (2) hold with P(---) a polynomial. Again, if 

the degree of P(---) is equal to » and all nth moments 

are finite (as we are assuming), we can differentiate inside 

the expected value integral on the right side of (4b) so 
that we have 


E(P(Yo, **+ Yn) expt DY bys] 

= PUD, op as D,)C,(to; "hae 2 tn)» (6) 
Combining (4b), (5) and (6), we obtain (3). Conversely, 
suppose (3) is true. Since the necessary moments are 


assumed finite, we obtain (6) and (5). Substituting in 
(4b), we have 


k 


BY [zo | Yo) «°° Yo] expt >, rah 
0 


= | Paw “++ Yn) EXPE Dy tan | 
0 


Since this holds for every choice of {t,}, (2) follows from 
the uniqueness theorem for Fourier-Stieltjes transforms. 
Extension to entire functions can be made by taking 
limits of polynomials. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 
We next consider the special case where 
Yr = t, + Ni 


and «,, N; are stochastically independent. We have the 
Theorem 2. 

Theorem 2: Let Q(-) be a polynomial in one variabl 
and let P(---) be a polynomial in (n + 1) variates. Ther 
a necessary and sufficient condition that 


E(Q(xo) | Yo, Y15 
is that 
Cy(to; ; 


a+ Yn) =-L(Yo, Yt, o> Ua) (7 


»+ ty) Q(Do) Cato, +++ tn) 
= P(Dp,)*+- D,) Cita; 
Proof: We first note that 

“+ ty) = Cr(to, ° 


t,) Cy (to, re aye (§ 


Ceaser to, ae t.) C80, to, o t,) 


so that | 
at . : 

54 ANT « , oe Lp 
8(t80)" Caxigl Sp; to, ) nal : 
=: Cy(to; ++) SDRC (epee 


Let Q(x) = ><a; x’, then 


E{ Q(x) | ip Be rll Ss a,E{x6 | isa ° S Yn] 


so that, using (5) term by term, and substituting in (8) 
we have 


Cy (to, ian t,) Q(Do)C2(to, =e tn) 


= POS Se DCs Sian tn) 


But since x,, NV; are independent, 


C,(to, vas t,) = C (to, kas t,)Cw(to, ape 3 ae 


so that (8) follows. Extensions to the case where Q(-) 
P(---) are entire functions can be made by taking limits 

Theorems 1 and 2 have assumed the optical estimato 
to be a polynomial or limits of such because then th 
corresponding linear operator on the characteristic func 
tions can be defined independently without bringing i 
the distributions. This can, of course, be extended t 
estimators which are not necessarily entire functions pro 
vided there is still a sequence of polynomials convergin 
to the optimal estimator in the mean of order two. Th 
latter would be the case, for instance, if the polynomial 
are complete in the Z, space induced by the joint distri 
bution of the {y,}. However, since this is not true withou 
additional assumptions and since, in this paper, w 
intend only to indicate the possibilities of our approack 
we confine ourselves more or less exclusively to poly 
nomials in the next Section. 


III. Process ApplicaTIons 


Theorem 1 would appear to be of potential applicatio 
to any prediction or extraction problem. In the firs 
place, if the problem can be reduced to one involvin 


1960 


inite-dimensional distributions’ and it is desired to check 
vhether it is possible for the system to be of specified 
‘orm, then Theorem 1 can be used directly to provide the 
mswer. However, shifting to a larger view, we now con- 
sider the class of processes for which a given system (or 
lass of system) is optimal. Moreover, we restrict ourselves 
in what follows to the signal extraction problem where 
the signal and noise processes are independent and 
additive, and some function of the signal is to be estimated. 
Thus, let X(t) represent the signal and N(t) the noise 
which is independent of the signal. Let 7 be the parameter 
set and 


Y(t) = X(t) + NO, 


Let Q(-) be a (Borel) function. We now phrase the 
characterization problem in the following way: For a 
given noise process V(t), characterize the class of signal 
processes X(t) for which the optimal estimator 


E(Q(X()] | YW, ter} 


CET 


is specified in form. In this paper, we consider only the 
case where Q(-) is a polynomial, and the sample-point 
set 7 is (or can more or less be reduced to be) finite, and 
moreover, the optimal estimator is also a polynomial. It 
is then clear from (8) that this would amount to solving 
a partial differential equation for C, (to, --- ¢,) for given 
Cy (to, «+: t,). The important feature is that the differ- 
ential equation is linear. 


Zero-Memory Filters 


We begin with optimal zero-memory filters. These are 
the simplest to consider since they involved only one 
sample point and, hence, only first order distributions. 
Nevertheless, they illustrate the type of results to be 
expected in the general case. Moreover, they constitute 
the essential part of an important class of nonlinear 
filters which consist of a zero-memory device sandwiched 
between two linear filters [3]. 

Let Q(-), P(-) be polynomials. Then, for any t, we 
wish to consider the class of processes for which 


E{Q(X(to)] | ¥(to)} = PLY (t)]. (9) 
Specializing Theorem 2, this means that we must have 
CrOQD)CA) = P(D)ICA)Cw(O)] 


where (C,(t), Cy(t) are the characteristic functions of 
X(t) and N(t), respectively, and D = 1/7 d/dt. For 
given C’y(t), (10) is then an ordinary differential equation 
for C,(t), and can be solved subject to the condition that 
the solution be a characteristic function. Before discussing 
(10) in full generality, we shall first explore some simple 
cases. First, let 


(10) 


P(D) = aD (11) 


3 The initial Laguerre transformation in [2] has this effect, for 
example. 


Balakrishnan: Optimal Mean-Square Systems 


493 
so that C,(t) must satisfy the differential equation 
QDe.W — a DEL) = af PHO Jor(y 

Cy(t) 
= a[Dxn()]C.0, (12) 


xw(t) being the logarithm of Cy(t). Further specializing 
Q(D) to be linear, we have (see [4]) Theorem 3. 
Theorem 3: Suppose 


E[X (to) | Y(to)] = aY (to). (13) 
Then, for some 6 > 0, 
C.) = [Cr@]", 0Sa=7—, 
1Oty Oyo y talacore (14) 


where the determination of [Cy(t)]* is such that it is 
positive when Cy(t) 1s positive. Conversely, if (13) is 
satisfied for all (¢), then (14) again holds. 

Proof: The proof is immediate, since (12) simplifies to 


Cy) — a) DC.) = aCy(t) DC,(0) 
so that 
(EDERAL ane 
Calis is ae ONS 


(15) 


where the primes denote differentiation with respect to ¢. 
For small enough 6, log Cy(¢) can be defined so that it 
is real when Cy(¢) is positive. Hence, we obtain from (15) 


log C0 = (2) tog e000, 
from which (14) follows, thus proving the necessity. The 
converse is clear, since we have only to retrace the steps. 
For the sake of completeness and at the same time to 
note some of the characteristic features of the problem, we 
shall now detail a few examples. It is hardly necessary 
to add that (13) is equivalent to saying that the optimal 
filter is linear. Hence, (14) characterizes the processes for 
which the optimal zero-memory filter is linear. In practice, 
it is certainly no restriction to assume that Cy(t) has a 
MacLaurin expansion in a nonzero interval about the 
origin. If (14) holds, this then implies that C,(¢) has also 
a similar expansion and, thus by a known result due to 
Marcinkiewicz (see [6], pt 212), it follows that C,(¢) for 
other values of t are determined uniquely. It follows by 
inspection that if the noise is Gaussian or Poisson, then 
the optimal filter is linear if, and only if, the signal is also 
Gaussian or Poisson, respectively. As an example of a 
distinctly non-Gaussian* process, we mention the case 


4 More generally, a class of distributions known as “infinitely 
divisible” distributions (see [6]) are left invariant, that is to say 
if one of the processes has an infinitely divisible distribution, so 
must the other for the optimal filter to be linear. However, the only 
practical example of the infinitely divisible distributions are the 
Gaussian and Poisson, and the T-type already cited. 


494 


where the noise is, for example, I’ type (this terminology 
is found in [6], p. 215), as when 


Cy) = [1 — et}, 4, 
Then the optimal filter is linear if and only if C,(¢) is 
also of the I type with 
CA) = Tl — ai. 


We note that the corresponding densities vanish on the 
negative real axis. 

Let us note that in order for C,(t) given by (14) to be a 
characteristic function, it is necessary that 0 < a < 1. 
However, for any given ‘‘a” in this range, there need not 
be a characteristic function. As an example, we have only 
to take 


Cy() = [p+ ge], with p+ q=1, 
Os», 


cr. 0; 


Cis A) 


and in this case it is clear that na would have to be an 
integer in order for C,(t) to be a characteristic function. 
On the other hand, there is at most one characteristic 
function solution (or signal process) since the differential 
equation (15) is of the first order. 

It should be pointed out that saying that the optimal 
filter is linear implies that there is a lower bound to the 
error in filtering (this error is readily seen to be 


[variance of X(f,)|[variance of N(¢o)] ) 
variance of X(t.) + variance of N(¢o) 


We have noted the features of the problem when the 
optimal filter is required to be a linear carry-over, mutatis 
mutandis, to the nonlinear situation. Unfortunately, it is 
not possible to state general results covering all non- 
linearities postulated and this is a phenomenon familiar 
in the theory of nonlinear systems. We shall therefore 
illustrate the type of results to be expected with examples. 

We shall first consider (12) to show that it is not always 
possible to find characteristic function solutions for any 
choice of type of optimal filter. Thus, let the filter be non- 
linear so that the polynomial Q(D) in (12) is of an order 
higher than one. Let us take the noise to be Gaussian, so 
that 


Nata 
Cy(t) = exp — “> 
Then we have, from (12), 
Q(D)C,(t) — a DC,(t) = ad*(id)C,(d). (16) 


This equation can be solved explicitly, since the left-hand 
side has constant coefficients. Thus, let 


OD) = 52 0.D". 
1 
Then, letting f(@) be such that 


[eI ax = C0, 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembei 


‘we have, by taking (inverse) Fourier transforms in (16)| 


that 
a : d 
and hence 


bs by* — ay | 


L_, (17) 


f(z) = C exp — / dy. 
0 

It follows from this at once that n [n > 2] has to be odd 

and further that (b,/a) must be positive, and moreover) 


that this is enough. The constant C in (17) has oer 


to be chosen so that f(x) integrates to unity. Comparisor 
of the density function (17) with the Gaussian shows tha 
it is steeper than the Gaussian, going to zero more rapidly} 
However, physical intuition appears to be inadequate tq 
pin down the precise nature of the distribution of X (to), 
It is also readily shown from (16) that the solution obtainec 
is the only possible distribution (with all moments finite) 

As another example of (12), we may take a discreté 
distribution and let N(t)) be Poisson so that 


Cy(t) = exp Ale’ — 1]. 


Then let 


Cy re ee 
0 
If we substitute this into (12), we obtain 


|= b,m* aa am jen oS [AN]Bm—1, m = I, 2, rake (18, 
1 

This equation determined {@,,} and it is readily seen that 
we require 


>» bym* — Fil 


= [> 0 


for every positive integer and, moreover, this conditio 
is sufficient. 

The last two examples can be extended to function: 
Q(-) which are not polynomials. Thus, in (17) we have 
only to replace >>a, y* by Q(y), and similarly in (18), anc 
the corresponding extended conditions on the functior 
Q(t) for (17) to yield a probability density and for (18) t« 
yield a (discrete) distribution can be readily determined 

Let us now return to the general case where Q(-) anc 
P(-) are arbitrary polynomials. First, we note that if we 
set 


QD) = > by D 
and 


P(D) = Ya, Dt, 
1 


60 


je can use the Leibnitz rule for differentiating a product 
lad rewrite (10) as 


ib b; Die, | = 


here 


n 


a 


7=0 


bs a,C* DMC, | DiC, . (19) 
kei 


Hiaatent 
ji(k—-7! 


‘his is a homogeneous ordinary linear differential equa- 
lon with variable coefficients and the coefficients are all 
ounded functions. In general, (19) can have, of course, 
early independent solutions that are characteristic 
inctions and any convex linear extension of these will 
gain be a characteristic function. Thus, if (19) has two 
mearly independent solutions that are characteristic 
nctions, it will then have infinitely many. 
_ We shall now illustrate the use of a technique that leads 
0 solutions of a class of characterization problems where 
ve are interested in invariance—that is, in the cases where 
oth signal and noise distributions belong to the same 
lass, Gaussian for instance. For this, we use (10) directly 
nd appropriate polynomial expansions of the distribution 
nvolved. 

Suppose, then, that the noise process is Gaussian with 

2,2 
Cy(t) = exp — x 


Ci 


ind we are interested in the conditions on the polynomial 
)(-) and P(-) that will ensure that the signal X(é) is also 
yaussian. Let us, thus, assume that 

Ast” 


2 


C(t) = exp — 
nd look at the consequences. First, let us note that 


yen 
i meme 


I 


D'Cy(t) 


pee 


2 


[(A,)"H,(tdx) ] exp — 


vhere H,,(-) is the nth Hermite polynomial 


H,(2) = e ”(—1)” g exp — 2/2 


da" 
sing the notation 
Sah. 


Ve then have from (10) 


CyQ(D)C, = bs bids) Hab |] exp = a 
nd 
P(D)[CyC.] = Bs a.l)'H00) || ex : Bee 


lence, we must have, for (10) to hold, 


Balakrishnan: Optimal Mean-Square Systems 


495 


x ax(td)"Hy (dt) = >> b,(ir2)*Hy (rol). (20) 
1 

This is an identity in ¢ so that we can equate like powers 
of ¢. Clearly, there is no harm in taking m = n, since we 
can consider the necessary coefficients to be zero. Since 
the odd order Hermite polynomials are odd, and the even 
order polynomials even, we can separate them in (20). 
To be specific, suppose n = 3. Then we have 


GgNo = a0sNs, 
a,d” + Ba,d* = 0,3 + 3bsA3, (21) 
Gs =" b\;; 
and 
Naa = Nz0>: 


These conditions can be satisfied only if a, = b, = 0, 
omitting the trivial case of zero variance. In other words, 
for n = 8, P(-) and Q(-) must be odd functions. For 
n = 5 and higher, this, of course, need not be. We may 
formalize this result as Theorem 4. 

Theorem 4: Let X(t)) and N(t,) have zero means and 
let their variances be d3 and Xj, respectively. Let N(t) be 
Gaussian. Then 


EY Q(X (to)} | Y(to)} = PLY (éo)] 


where P(-) and Q(-) satisfy (20) if, and only if, X(é,) is 
also Gaussian. 

Proof: Since we have already given the arguments for 
the “‘if”’ part, we need only prove necessity. Suppose then 
that (20) is satisfied. Then one solution of (10) is clearly 
gotten by taking 
Ait’ 


aa, 


Cit) = exp — 


We have now to show that this is the only solution. This 
is given in Appendix I. 

It is, of course, possible to resolve (20) into (21) for 
arbitrary ‘‘n,”’ but since this involves too much notation, 
we have refrained from doing so. 

As another example, we shall consider the case where the 
noise has a I'-type distribution. Thus, let 


Cyt) = [Ll — ett)”; (22) 
then note that 
D'[Cy(t)] = ethane) Meme snl ale (23) 


[1 — et]*™*** 


Proceeding as in the previous example, we can state 


Theorem 5. 

Theorem 5: Let N(to) have T-type distribution so that 
Cy(t) is given by (22). Then a necessary .and sufficient 
condition for 


HEX ey | YC) "al Y Cals n positive integer, (24) 


where 0 < a < 1 is that X(t) also have a distribution of 
the I’ type. 


496 


Proof: 
Sufficiency: Let 


Cr Heel 


Then we shall show that for suitably chosen y2 (24) holds. 
For this, note that (10) in this case, using (22), can be 
written 


CY Aa et ad eat ae Lb) 


=ac'y(y + 1) ---wtn-—]l) (25) 


where 
y=EHant+yre- 


Since 0 < a < 1, it is clear that (25) can be solved to 
yield a positive y, y > y:, and hence, yz = y — ¥; will 
serve as the exponent. 

Necessity: Suppose that (24) holds. Then we know that 
by Theorem 2 we must have 


Cy D’C, = a D"[CyC,]). (26) 


Now, into (26), let us put 
Co le = "viel. 


Then, in order for this C,(¢) to be a solution of (26), it is 
enough if (25) is true. Now (25) considered as an equation 
for y has exactly n roots, each root y yielding (y — 7) 
as a possible value for y2. However, the corresponding 
solution will be a characteristic function if and only if 
Y2 is positive. For 0 < a < 1, as given, (25) has a positive 
root y, y > v1. Moreover, this is the only such root. For, 
upon differentiating (25) with respect to y, we note that 
the derivative is positive for all positive y, and hence, 
(25) cannot have more than one positive root. This con- 
cludes the proof of the theorem. 

Let it be noted that we have proved slightly more than 
the theorem states. Thus, the exponent y2 is to be calcu- 
lated from (25). 


Filters with Memory 


In the more general case, which we now consider, the 
optimal filters will have memory. Now it is clear that our 
methods can yield information on the joint distribution 
of the processes involved for as many sample points as are 
used in the optimization, excluding trivial cases. The 
theory is naturally richer since there are more degrees of 
freedom. Here we discuss primarily the characterization 
problem for processes for which the optimal extraction 
filter is linear. The partial differential equations that arise 
are then of the first order and our result pertains to their 
solution. We again assume that the distributions have all 
finite moments of as high an order as required. 

Theorem 6: Let 


E[X (to) | ¥(to), YC), +++ V(t] 


ia ae Xm ¥(t) (27) 


IRE TRANSACTIONS ON INFORMATION THEORY 


‘where not all {a,} are zero, for example, a, # 0. The 


Septemb 


for some 6 > 0 and all {#;}, such that 
lt] = V@fe@4 <5, 
the characteristic function of X(to), X(t), «+ X(t.) mu 


be of the form 
log Cie: ti, ie: t,) 
ma Fitts, €r, -** €y) = g(c1, C2, dis.) (28 
where 
Cy = Acoli + Abo, 
Co = Ost, = Ayo, (29 
Cs Ant Onlas 
Rh; Ci, eared Ga 
re ne as Aoty an 
* a, Jo af ay rh 
Ast; — Co ; Onis, = | dt 
Ay ay 
Q(to, th, eee t,) 
0 - ) 
= E — Mo) Fear S ay, 2 log Cy(to, i, +++ ty 


and g(---) is an arbitrary function. Conversely, if (28 
satisfies for every {t,}, then (27) holds: 

Proof: 

Necessity: By Theorem 2, for (27) to hold, we must hay: 


0 
Cn(to; hile aw Ey E alo 


a) Ce 
— a, at, eaene 0. leat Uy Sake t,) 
0 
= CA to; pay bi) E <o A) Oto 
0 0 Y 
+a, ant tae th On ee Cv(to, is tae Es) (80 


As before, we may consider this as a (partial) differ 
ential equation for C,(to, ti, +--+ t,) with Cy(to, 4, «+- & 
given. The equation is of the first order and can be solvec 
by standard methods. We include a few steps in th 


. solution. First, we can choose 6 so that for !t| < 6, botl 


characteristic functions are never zero or negative, so tha 
their logarithms can be defined. Next, letting 


Q(to, ty +++ th) 


a 
<6 dea 


a a 
ms! [C1 ade eee 
[ Mo) 57, + % al, al,. 


Log Cy(to, hh, ie ty) 


nd 
Fi= lowe Cyto; t;,.<<<<t,), 
ve have 
dig_ dy _ dy _ dP. 
Gon =O, = Gy: iQ) 


Jefining {c;} as in (29), we can express fo, ty, t3, --- & in 
perms of {c;} and ¢,, since a, is assumed nonzero, as 
‘ollows: 


Aoty as Cy 
f=" 
ree Ast, — Co 
2 ay : (31) 
| a a Gals pe Cn 
ay, 


Substituting these in Q(t, 4, --- ¢,) and setting 


R(t, C1, C2, Rate e cr) 


we obtain the general solution of (30) as 


R(t, Bis ON Cy) ate (C1, Ca; Efigae on 


Moreover, since a, ¥ 0, any solution of (30) must be so 
expressible. Hence (28) follows. Here g(c, 2, --+ €n) 18 
the general solution of the homogeneous equation and has 
as many derivatives as C,(to, 4, --: t,) has except where 
the latter is zero. 

Conversely, suppose (28) is true. Then by direct differ- 
entation (30) follows and by Theorem 2 (27) also follows. 

Suppose now that, for example, the noise process dis- 
tribution is specified. Then for arbitrarily given {a,} there 
is an X(t) process for which (27) holds, only if there is a 
sharacteristic function of the form given by (28). Suppose, 
n particular, that the noise is Gaussian. Then, we know 
shat if there is a solution at all, there is always a Gaussian 
solution which has the same first and second moments as 
she given solution. On the other hand, in contrast to the 
me-dimensional problem, there may be more than one 
solution even in this case. We illustrate this with the 
ollowing example. We consider an optimal extraction 
iter with two sample points. Thus, let 


Cy(to, t:) = exp — 3(to + hh) 
nd 
G(to, t,) SD) 4(t5 ar i + tot). 


Phen, G(to, t,) satisfies (80), considered as an equation for 
1,(to, 41) with Cy(th, t,) given as above, if we set 


do = 8/15, Ge 2) LS 
nd the rest of the {a;} zero. 


Balakrishnan: Optimal Mean-Square Systems 


497 
Next let 


ous (2 ~ By], 

4! 15 

Then C(t, t,) also satisfies (30) with the same choice of 
@ and a,, for every value of a;. We shall now show that 


C(to, t:) is a characteristic function with a suitable value 
for a4. For this, let 


C(to, ty) a G(to, if Gs 


T = (the column vector) fy | P 
OG, ee AL 
(T” being the transpose of 7), and 
rede ee ea 0 | 
Or 15/8) bss —2/8 15/8 
SoZ S05) 
Se 30° 225 
Then 
ies pa ae 22) — 30 ae Ay Aye 
675 —30 OZ Ai2 Aoo 
Now let 
, Gx) 4 
Colto, a) ={1+ A o exp — 2Q(to, a) 


so that its inverse Fourier transform is 
ea // C(t, «) exp — w(xt + cy) dt de 
(27) "5. g 


= Gz, Wl ae 1 yl V a22 (y ar 


=| 


where H,(-) is the fourth Hermite polynomial as defined 

before, and Go(z, y) is the Gaussian density corresponding 

to exp — 4 Q(t, o). Since 
[1+ % 14) | > 0, —-o <xX%< om, 


provided 0 < a, < 4, it follows that Co(é, «) is the Fourier 


transform of a nonnegative function, and _ since 
C,(0, 0) = 1, it is a characteristic function. But 
2to + 8t 
(leave Ost e : a 


so that C(to, t,) is‘a characteristic function also, as required. 
However, under certain conditions, the linearity of the 
optimal filter does imply that X(¢) is Gaussian, also, as 
will be shown in Theorem 7. 
Theorem 7: Let N(t), t © a be Gaussian and let 
E(N(t)?| ¥ 0. Then a necessary and sufficient condition 
that X(t), 1 © s be Gaussian also is that 


EIX@ | YO, VG — T)), --- # = T,)] 


SEER) EY (t), YG 0); ee Y(t ea) 


498 


for every ¢ e w and for every choice of distinct {7} and n, 
(t — T;) © x, where £ denotes the conditional expectation 
assuming Gaussian distributions have the same means 
and variances as the given processes, ‘and is the optimal 
linear filter. 

Proof: The sufficiency part being well-known, we need 
only prove necessity. We prove this by induction. Suppose, 
then, we have proved that all n-dimensional distributions 
of the X(t) process are Gaussian. We shall first show that 
this implies that all (n + 1) dimensional distributions be 
Gaussian also. We construct a Gaussian process with the 
same first and second moments as the given X(t) process. 
Let us pick any set of points ¢,¢ — T,,---¢— 7, from z. 
Let the corresponding characteristic function of X(é), 
X(t — J), --: X(t — T,,) be denoted C,(b, th, --* t). 
Let G(to, t:, --- t,) be the constructed Gaussian character- 
istic function corresponding to C,(to, t:, --+ ¢,). Let 


HATS) NG) Wer 1) has Y(t — T,)| 
(= Go) ¥(@) + ay (tb — 2) > ba, VC = T,). 3) 


Since N(t) is a Gaussian process, G(to, t:, --- ¢,) is, of 
course, a particular solution of (80) which C,(t, --- ¢,) 
must satisfy by Theorem 2. Now, not all {a,} in (33) can 
be zero. For in that case, since 


EIN() | Y@, «++ Ye — T,)] 
YO — EXO | YO), Y@— Ty), «+» Ye —T,)] 


AoY() — >> a,Yt—T,), 
1 


this conditional expectation would be zero also. This 
implies that E[N(t)"] = 0, contrary to the assumption 
that E[N(t)”] # 0. Hence, from Theorem 5, assuming, 
for example that for0 < ¢t < 6 


Ci(to, Pega os tn) = G(to, hiptiee t,)h(to, bin fh tn) 3 (34) 
where h(t, --+ ¢,) is a solution of the homogeneous equa- 
tion 


0 0 ) iy 
la — 4) 5p + oe + vee + 0,2 late Atha eek) 2 


Actually, a little consideration of (30) shows that (34) 
holds for all {t¢;} since a particular solution valid for all 
{t;} is known. On the other hand, if, for example, a, is 
assumed to be nonzero, h(to, «++ ¢,) must be of the form 
Tits: see 


= ta) = gli, Coy -** Cy) 


where the {c;} are given again by (29). If in (34) we set 
t, = 0 we have 


C(to, 0, te, ke th) 


= Gites 0, to, os t,)g(aito, = (Ohteye —Ayzt3, ee —ay,t,). 


But by the induction hypothesis, the left-hand side is 
Gaussian and hence equal to the first factor on the right. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septembe 


Hence 


Gato, —Gite, —Qits, -**, —Gt,) = 1 


or 
h(to, t1, nn tn) =F {I 


also. Hence C,(to, ti, «:+ t,) is Gaussian. To complete t 
induction, we have only to show that first. order distr 
butions are Gaussian, and this readily follows fron 
Theorem 3. 

An obvious implication of Theorem 7 is, of course, tha 
in the multidimensional case we must require that a 
finite sample point optimal filters be linear, and, of ie 
in the counter example given prior to the theorem, thi 
is not true. We may note that Theorem 7 also holds if, fc 
instance, we place “Gaussian” therein by ‘Poisson. 
We have, of course, made no stationarity assumption. | 

If the processes are strictly stationary, we can weake 
the conditions of Theorem 7, as will be shown in Theorem é 

Theorem 8: Let N(t), X(t), — ~ <t < © be strictl 
stationary stochastic processes. Let V(t) be Gaussian wit 
nondegenerate second-order densities.” Let X(t) be — 
Markoff process. Then a necessary and sufficient conditio 
for 


E(X() | Y@), Yt — T)] = ELX® | YO, Ye — T)] GE 


for every T > 0 is that X(¢) be Gaussian. | 
Proof: Sufficiency being again trivial, we need onl 
prove necessity. Let C,(to, t,) be the characteristic func 
tion of X(t), X(t — T) and Cy(to, t:) that of N(), Nit — 1” 
Let 
BUX) | Y@), ¥e- 1) 
= (1 — a) Y(t) + a, Yt — T). (36 
Now a and a, cannot both be zero since E[N(t)? # 0 b 
hypothesis. Let G(to, &) be the Gaussian characteristi 
function having the same first and second moments ¢ 


C,(to, t,). As in Theorem 7, we know that C,(to, #,) mus 
be of the form 


Ci(to, ty.) = G(bo, t)h(Qoti + arto). 
However, by assumed stationarity, 
C.(0, t) = G0, h(a) 

= C,(t, 0) = Gt, O)A(at). 


Hence 


h(a,t) = h(aot) (3% 
for all ¢t. If either a) or a, = 0, this would imply the 
h(t) = 1 or that C,(to, t:) is Gaussian. Since both cannc 
be zero, we need only consider the case where neither c 
nor d is zero. Suppose now that h(t) is not a constan 


> By this we mean that the matrix of second moments is positiz 
definite. | 


960 


hen (37) clearly implies that a, = a>. However, this leads 
0 a contradiction, for since 


GAoo — GA = Mo + ELX(IE[N(O] 


nd 
QAn — GAn = My, + EL[N(O)JELX(t — T)) 
vhere 
Noo = EL Y(t)"] = An, 
ho = EL Y@ V(t — T)), 
Moo = E[N(t)’], 
und 
No = E[N(ONG — T)], 
ve have 
Ao(Aoo — Aor) = Moo + E[XMIJEIN(] 
and 
Ao(Aor — Aoo) = Mo + E[N()IE(X(E — T)I, 
<0 that, using stationarity again, Mo, = — Mo, and, hence, 


Moo + Mi, + 2M. = 0, 


or the second order distribution of N(t), N(t — T) is 
degenerate, which is contrary to hypothesis. Hence, 
C(t, t:) is Gaussian, and 7’ being arbitrary, all second 
order distributions are Gaussian. However, since the 
process is Markoffian, this implies that X(¢) is a Gaussian 
process. 

The conditions that the two-point optimal filter be 
linear for every 7’ > 0 is perhaps too stringent and can 
probably be weakened. Indeed, in the case of discrete- 
parameter processes (or time series) we can prove a 
stronger version, as will be shown in Theorem 9. 

Theorem 9: Let N,, X,, — © <n < o& be strictly 
stationary discrete parameter processes, let V,, be Gaussian 
with nondegenerate second-order densities, and let X, 
be a Markoff process. Then a necessary and sufficient 
condition for 


E[X, | Gy en > E[X,, | We ¥541] (38) 
is that X, be Gaussian also. 

Proof: Sufficiency again being trivial and well-known, 
we need only prove necessity. Here we can make use of 
Theorem 8, from the proof of which we readily obtain 
that the joint distribution of X, and X,_, is Gaussian. 
But now the joint density of X,, X,-1, Xn-2, because X,, 
is Markoffian, can be written 


Xs) eee eo! 


aa PX, | 2G AO, Caer | Dena) Xa) (39) 


Balakrishnan: Optimal Mean-Square Systems 


499 


Now, since P(X,, X,-1) is Gaussian, so also is the con- 
ditional density P(X, | X,_,), and, by stationarity, so is 
P(X,-1 | X,-2). Hence, all the factors in the right side of 
(39) are Gaussian, and, hence, so is the left side. In a 
similar manner, it readily follows (by induction, if neces- 
sary) that all joint distributions are Gaussian. 

Extension of Theorems 8 and 9 to stationary vector 
Markoff processes is apparent. Indeed, the proof of 
Theorem 8 shows that for stationary processes, there is 
at most one characteristic function solution to (30) under 
the asumption of a nondegenerate second moment matrix 
for the fixed distribution, so that if there is one other it is 
automatically unique. Since the second process is usually 
Gaussian, perhaps Theorem 8 is adequate for most 
practical purposes. 


CONCLUSIONS 


A new approach to the least-squares optimization theory 
has been developed. Instead of determining an optimal 
system for given processes, we characterize the processes 
for which a prescribed system is optimal. If the designer 
has a particular type of system in mind, then our results 
can be used to describe the class of processes for which 
such a system will be optimal. An immediate advantage of 
this approach is that the general nonlinear problem now 
becomes a linear one—although time variant. 

In an important subclass of problems, signal and noise 
are additive and independent and some function of the 
signal is the desired output. Here we have characterized the 
class of signal processes for which a given ‘‘extraction”’ 
system is optimal, assuming that the noise process is 
known. In some cases it turns out that there is exactly 
one signal process for a given noise process and this opens 
the possibility of recognizing the signal by the kind of 
extraction system that is optimal for it. For stationary 
nth order Markoff processes, for example, the signal is 
Gaussian if, and only if, the optimum n-point filter is 
linear and the noise is Gaussian. 


APPENDIX I 


We wish to prove the necessity part of Theorem 4. 
Thus, we are given that one solution of 


Cu(NQ(D)CAA) = PID)Cu(OC.O) (40) 
where 
QD) = Db, Dt, 
P(D) = » aie 
and 
y(t) = exp — e 


500 
is given by 


dit. 


CA) = exp — > 


We now have to show that there is no other solution that 

is a characteristic function with mean zero and variance 

\?. We are, of course, assuming \,, 4, # 0, and that we 
are only after characteristic functions which have a 
MacLaurin series expansion around the origin. First, let 
2,2 

G,(t) = CX) oe a 


and 
ae 


GW) = ep —*F, M2 = 2 +23, 


and, following (42), suppose we express the possible 
characteristic function as 


oe aol ie Ds cat) | Gov et 


That is, the distribution of X(¢)) is expressed by a finite 
Gram-Charlier expansion. 
Let 


d 


u(t) = 3 ka,{it)** = dh Or 


Then, substituting (41) into (40), we have 
ev(o| © 6 DC, DG Dv | 
1 k=0 
= et OD Ge Davie=20 
1 k=0 
where the Ci are the binomial coefficients. If we collect 
the terms in this expression for k = 0, we note that by 


virtue of the fact that C,(t) = G,(é) is a solution of (1), 
these terms already equate to zero. Hence, we have 


ev(a| b, 2 Ci DIG) put | 
= a Cl Dae) DUG) = 0. 
If we now use 
D'{G\()] = Od HANG 


and 


D'G()] = @)"AANG(), 


IRE TRANSACTIONS ON INFORMATION THEORY 


this can be expressed as 


Septembe 


> 6, > Cia)’ *H, (ay) DUO) 
s 1 
— Sa, 3 Clay aoa) DU@ 8 
1 5t 


Collecting derivatives of U(t), we have 


n 


ya 


kal 


| 3 cl0,enytH, fn) 


— a,()'-*H, (0) | DAU: (42 
If we substitute 


U(t) = > heoy,(it)*? 


in this, we obtain an identity in powers of ¢, and th 
coefficient of the highest degree, namely n + m — 2, } 
given by omitting nonzero multiplicative constants, 

ote 


= Gx 


, || 
and this must be zero. However, this cannot be zero, sin 


for G,(t) to be a solution we have already seen that | 
must have 


DNS Go 


Hence, this proves that V(t) cannot be a ‘polynomial, ¢ 


C,,(t) cannot have a finite Gram-Charlier expansion. 
In the more general case, where 


ao 


Vi) = DS awli)* 
; ; 
we note that, for t small we can express V(é) as 


V@) = an(tt)"[1 + 0] 


where «,, is the first nonzero coefficient. Substituting thi 
into (42) again, we can prove again that V(t) must H 
identically a constant. This proves the necessity part « 
required. 


BIBLIOGRAPHY | 


[1] L. A. Zadeh, “Optimum nonlinear filters,’ J. Appl. Phy. 
vol. 24, pp. 396-404; April, 1953. 

[2] A. G. Bose, “A Theory of Nonlinear Systems,” Res. Lab. 
Electronics, Mass. Inst. Tech., Cambridge, Mass., Tech. Rep 
No. 309; 1956. ; 

[3] A. V. Balakrishnan and R. Drenick, ‘(On optimum nonline 
extraction and coding filters,’ IRE Trans. on INFORMATIO! 
Turory, vol. IT-2, pp. 166-173; September, 1956. 

[4] R. G. Laha, “On a characterization of stable law with finii 
a a Ann. Math, Statistics, vol. 27, pp. 187-195; Mare. 


[5] L. Doobb, “Stochastic Processes,’ John Wiley and Sons, Inc 
New York, N. Y.; 1953. 

[6] M. Loeve, ‘Probability Theory,” D. Van Nostrand Co., Inc 
New York, N. Y.; 1955. 


960 


IRE TRANSACTIONS ON INFORMATION THEORY 


501 


CORRECTION 


“On New Classes of Matched Filters and Generaliza- 
ions of the Matched Filter Concept,’’ David Middleton, 
hese TRANSACTIONS, June, 1960. 

The Editor wishes to call attention to, and to correct, 

number of typographical errors in the paper of the 
bove title, which were inadvertently not eliminated in 
he final stages of printing. These are: 

In (8b), lower case s(t) should follow ap. 

Add subscript capital T to G in (5). 

On the fifth line of footnote 24, vis should be viz. 
Two lines below (11), c; should be v;, and on the line 
efore (12), y should read y. 

On the eighth line of the second column on page 353, 
7(¢) should be v(é). 

Two lines below (32), delete the comma after “‘general,’”’ 
nd four lines below (32), the second part of the equation 
hould read: 


A [H’(t;, t;) At] (40). 


Three lines above (33), Q§ should be Q%. 

In footnotes 38 and 41, the page reference on the last 
line should be p. 412 instead of p. 348. 

In (87b), 7; should be v;. 

Three lines above (43), the expression h(¢,;, ¢;) should 
be H(t,, t,). 

The right side of (48) should read H’(t, 7) ¥ H(z, fb). 

In (53), the term (W7) should be (W,). 

On the tenth line of footnote 44, P- should be pc. 

Three lines below (57), \,, should be ,,. 

Eleven lines below (57), replace the first 2 in 
hu(T — x) = 2cos [2rm(T — x)/T] by én, the Neumann 
factor €, =ali-e,,— 2,11 == 0: 

On the ninth line below (58b), e’*7’ should be e’*?. 

In the column entitled ‘Filter Structure’ in Table I, 
the equation for b) of 3) should be H’(t, 7) # H(z, 2). 


Correspondence 


Remarks on Sine Waves Plus Noise* 


In a recent letter,! Levine and McGhee 
presented a short table of the first order 
cumulative distribution functions (cdf) of 
a sine wave of random phase plus Gaussian 
noise. Their table is computed from an 
integral given by S. O. Rice.2 However, 
in a later publication,? Rice gave explicit 
expressions for the first order probability 
density functions (pdf) and cdf in terms 
of series of Hermite functions and error 
functions. From Cramer’s inequality, Rice’s 
series are seen to converge more rapidly 
than the series for exp (r?/2!/?), where r 
is the signal/noise ratio. Thus, a few terms 
would suffice for moderate values of r. 
For example, 20 terms would give a 5-place 
accuracy for r < 2.5. In addition, Rice 
gave several asymptotic formulas for the 
first order pdf for large r. 

The writer recently derived exact ex- 
pression both for the first and for the 
second order pdf’s of the previously men- 
tioned stationary process in terms of 
rapidly converging series in (tabulated) 
Bessel functions of purely imaginary argu- 
ment. The results are as follows:* 

For the first order pdf, 


1) ply) = (20) exp (©) 


-Iyq((2r)'’*y) 
where 


signal + noise 

UG era 
(noise power)!/? 
signal power 

t= ———_——_. 
noise power 


For the second order pdf, 
2) palYr, Yo) = (2m) (1 — pa) ”” 


2 2 
‘ Ua + ae 2a | 
Sa | 2(1 — pa) 


—r(1 — pa ecg 28) 
ex | ps 


- >> cos {20 tan! 


q=—-@ 


*Received by the PGIT, July 8, 1960. 

1 A, Levine and R. B. McGhee, “Cumulative 
distribution functions for a sinusoid plus Gaussian 
noise,’ IRE Trans. on InFrormMaTION THEORY, 
vol. £8. -5, pp. 90-91; June, 1959. 


O. Rice, ‘Mathematical Analysis of Random 
Noise,” Bell Telephone Sys. Monograph No, B-1559, 
Dp. 105; 1945, 


3S. O. Rice, ‘Statistical properties of a sine 
wave plus random noise,’ Bell Sys. 
27, pp. 109-157; January, 1948. 

4 The results are derived at length in an article 
Dy ve author to appear in 7. Angew. Math. u Phys., 
Zurich, 


Tech. J., vol. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Gi wh fio A}. 
| ae Yo)(1 = pa) be 2 


E — pa COS a 
De D 


1e—"pE 


yA 
(y1 — ys) sin > 
2 il DUN ese ae 
2a 1 — pa 
2))1/2 
wi 
(yi + Ye) cos > 
| —— 
1S aye 
where A is the time interval, p, the noise 
autocorrelation and w the signal frequency. 
The use of Rice’s series or that given in 1) 
would appear to be more convenient than 
numerical integration for evaluation of the 
first order distribution functions in some 
cases. 
R. Lerpnrk 
Michelson Lab., USNOTS 
China Lake, Calif. 


Correction to a Paper by D. G. 
Lampard* 


As a consequence of a recent communi- 
cation from D. G. Lampard of Sidney, 
Australia, the author wishes to make a 
correction in his recent paper! in which 
the following sentence appears: 


Eq. (4) does not seem to have been 
observed before except by Levin® 
whose formula is in error. 


Mr. Lampard kindly points out that 
(4) has been observed before by several 
other authors, including himself. He lists 
a set of references.” 


I. 8S. Reep 
Lincoln Lab. 
Mass. Inst. Tech. 
Lexington, Mass. 


* Received by the PGIT, February 18, 1960. 

1]. S. Reed, ‘On the use of Laguerre polynomials 
in treating the envelope and phase components of 
narrow-band Gaussian noise,” IRE Trans. on INror- 
cere Tueory, vol, IT-5, pp. 102-105; September, 
959. 

2D. G. Lampard and J. F., Barrett, ‘‘An expansion 
for some second-order probability distributions 
and its application to noise problems,’ IRE Trans. 
on InrormMation TuHerory, vol. IT-1, pp. 10-16; 
Mare. 1955. 

. 8. Miller, R. I. Bernstein, and L. E. Blumen- 
son, nin oralteed Rayleigh processes,” Quart. Appl. 
rhea vol, 16; July, 1958. 

. Nakagami, K. Tanaka, and M. Kanchisa, 
“The te -distribution as the general formula of in- 
tensity distribution of rapid fading,’’ Mem. Fac. 
Engrg. Kobe Univ., Japan; March, 1957. 

S. O. Rice, “Communication in ‘the presence of 
noise- probability of error for two encoding schemes,” 
Bell Sys. Tech. J., vol. 29; January, 1950. 


Septeml 


Note on ‘‘On Upper Bounds f 
Error Detecting and Error Correcti1 
Codes of Finite Length’’* 


In a recent article,! Wax cites the rest 
of Laemmel that the best value actual 
found for the number of sequences 
length 12 with a minimum Hamming d 
tance of five is 24. Here it will be illustrat 
how this number has been increased 
28. The latter number is in closer agre 
ment with Wax’s upper bound of 46 
than the former. 

By using all of the distinct cyclic perm 
tations, the following pair of binary s 
quences of length 12 will produce | 
sequences with a minimum distance 
five between any two sequences: 


1 ky 0° LO" 0:07 el OS ee ee 
Io A LS Or; be Or 0 OS ORS 


Adding to these the sequence of all zer 
the sequence of all ones, and the two di 
tinct cycles of 010101010101 gi 
a total of 28 sequences. These sequence 
were also run off on a computer in 
attempt to increase their number. Ho 
ever, no new sequences were uncover 

Additional pairs of sequences of leng; 
12 which autocorrelate and crosscorrel 
favorably are: 


001110010010 
00°C 0 DUD PU 


OO0TTEOOTRGLY 
110110101000 


110000101100 
01000 LOULLE MK 


Sylvania Electric Products, In 
Buffalo, N. 


* Received by the PGIT, March 3, 1960. 

1N. Wax, “On upper bounds for error detect 
and error correcting codes of finite length,” I) 
TRANS. ON INFORMATION THEORY, vol. IT-5, 
168-174; December, 1959. 


A Note on Single Error Correcti: 
Binary Codes* 


In a recent paper on double adjac 
error correcting codes,! the possibility 
deriving a class of related binary sing| 
error correcting codes was mentio 


* Received by the PGIT, February 15, 19 
1N. Abramson, ‘‘A class of systematic codes | 
non-independent errors,” IRE Trans. on INF 
vanes Turory, vol. IT-5, pp. 150-157; Decem 


160 


renthetically. It now appears that this 
ass of codes may have some advantages 
simplicity of equipment over the ordinary 
amming? codes. In this note, therefore, 
e shall explain in more detail the con- 
ruction and properties of these codes. 
Consider single error correcting block 
des with & information digits and 
— k = r check digits. It is well-known? 
at for a given r, the maximum value of n 
just 27 — 1. We shall restrict ourselves 
» codes where 7 is equal to this maximum 
ulue. Let aia2 --- d» be the binary digits 
mmprising a word of such a code. Then, 
r a Hamming code, the a; must satisfy 
le (mod 2) equation. 


m+ tt, +---+2.0,=0 (1) 
here z; is the r X 1 matrix 
1) 
0 


0 


. co . 
- is the matrix 


0) 


; is the matrix 


(1 


O 
nd the remaining x; are developed in 
rder from the binary counting sequence 
ntil xn, which is given by 

| 

1 

1 
1 


i 
For the class of codes mentioned by 


bramson,! however, the a; must satisfy 
1e (mod 2) equation® 


Zi 


2R. W. Hamming, ‘Error correcting and error 
stecting codes,” Bell Sys. Tech. J., vol. 29, pp. 
(7-160; April, 1950. ; hs, 

3 The eS Rs in (2) Nog While ele by 
BE. Meggitt in ‘‘Error correcting codes for correct- 
Raitixets of errors,’ to be published in the IBM 
murnal. 


Correspondence 
Yad 1 Y202 +--+ + Ynt, = 0, (2) 
where 
y: is the r X 1 matrix 
1 (3) 
0 
0 
0 
Yo = 195, 
y= Ty, 
Wyk ais 


and 7 is any r by r binary matrix whose 
binary characteristic polynomial is both 
irreducible and of a maximal period.! An 


alternate characterization is that the 
elements of the top row of y1, y2, *-* Yn 
define an m-sequence® ending in r — 1 


zeros and the elements of the j*® row of 
the y; define the same m-sequence shifted 
to the right by 7 — 1 digits. 

We shall illustrate the preceding para- 
graph for the case n = 7 and k = 4. An 
acceptable 7 matrix for any r < 19 (and 
thus, n < 524,287) may be obtained from 
Marsh’s tables.® We select 


i 0 1] 
T = 0 0 (4) 
Lon ad | 
so that (2) may be written as 
i 1 1 f 0 
O'a, + |lila.e + |1 Ja, + |1 Ia, 
| 0) 1 1 
1| 0 0 0 
+10 las + |1 laa + |0 la, = |0}. (5) 
1 (0 1 0 


Note that for this code, it is possible 
to take the last three digits a;a, and a, 
as the parity digits, and the first four 
digits a1, @2, a3; and a, as the information 
digits. It can be shown in general that 
the choice of y:, as in (3), allows one to 
place all parity digits at the end of the 
block—an important advantage in many 
applications. Furthermore, this may be 


4B. Elspas, ‘‘Theory of autonomous linear se- 
quential networks,’ IRE Trans. on CrircuiT 
Tuerory, vol. CT-6, pp. 45-60; March, 1960. 

5 N. Zierler, ‘‘Several Binary-Sequence Genera- 
tors,’’ Lincoln Lab., Mass. Inst. Tech., Lexington, 
Mass., Tech. Rept. No. 95. ¢ 

6R. W. Marsh, ‘‘Table of Irreducible Poly- 
nomials Over GF(2) Through Degree 19,” National 
Security Agency, Washington, D. C.; October 24, 
1957. 


003 


done without destroying the simplicity of 
equipment necessary to implement the 
code. Since the 7’ matrix corresponds to a 
time delay of one unit in an r-stage feed- 
back shift register, this shift register may 
be used to time the generation of the parity 
digits.’ For example, Fig. 1 shows the shift 


Fig. 1. 


register corresponding to (4) which gener- 
ates the m sequence used in (5), 1110100. 

Implementations based on the use of the 
feedback shift register have also been ob- 
tained by Peterson® and Meggitt. 


N. M. Apramson’ 
Stanford University 
Stanford, Calif. 


7N. Abramson and B. Elspas, ‘‘Double-error- 
correcting encoders and decoders for non-independ- 
ent binary errors,’’ Proc. UNESCO Conf. on In- 


formation Processing, International Documents 
Service, Columbia University Press, New York, 
N. Y.; 1960. 


8 W. Peterson, ‘Error Correcting and Error De- 
toting Codes,’’ to be published by Technology 
ress. 
9 J. Meggitt, ‘“Error correcting codes for correct- 
ing bursts of errors,’’ to be published in the IBM 
J. Res. & Dev. 


Transmission of Photographic Data 
by Electrical Transmission* 


The purpose of this note is to dispel 
some misconceptions arising from the 
following statement in the Space Hand- 
book: 


“A communication system with a band- 
width of 6 megacycles per second will 
have to operate continuously for 22.5 
minutes to transmit the quantity of 
information that can be stored on a 
single 9 X 9 inch photograph at 100 
lines per millimeter.” 


The 22.5-minute transmission time can 
be deduced from any of a large set of 
plausible assumptions. However, the trans- 
mission time is quite sensitive to the 
assumptions which are made, and the 
statement in the Space Handbook should 
not be accepted without reservation. The 
authors have found two citations of this 
statement (in classified documents) which 
show in each case that the author has 
been misled. 

As an example, here is a set of assump- 
tions which will lead to the 22.5-minute 
figure: suppose that the source material 


* Received by the PGIT, February 10, 1960. 

1 “Space Handbook: Astronautics and Its Appli- 
cations, Staff Report of the Select Committee on 
Astronautics and Space Exploration,’’ U. S. Govern- 
ment Printing Office, Washington, D. C., 86th Con- 
gress, 1st*Session, House Document No. 86, p. 181; 
1959. 


504 


consists of (200)? independent points per 
square millimeter, each carrying four bits 
of information (7.¢e., 16 distinguishable 
gray levels), and that the electrical trans- 
mission system transmits information at the 
rate of one bit per second for each cycle of 
bandwidth. 

These assumptions may be valid in a 
particular case. In other cases, the infor- 
mation content of the picture may be less 
and the channel capacity of the trans- 
mission link greater. Consider the following 
example. 

A photographic resolution of N lines 
per millimeter means? that an array of 2NV 
equally-spaced lines, alternately black and 
white, can be resolved. This does not 
mean, in general, that 16 gray levels can 
be distinguished at this resolution; on the 
contrary, the resolution is defined in terms 
of the number of lines per millimeter at 
which only two gray levels can be distin- 
guished. 

Generally speaking, a picture does not 
preserve perfect clarity of tonal detail 
right up to the limit of resolution. On the 
contrary, the detail “cuts off’ with de- 
creasing wavelength. If the resolution is 
determined, for example, by the finite 
size of a scanning aperture, the loss of 
gray-scale resolution as a function of 
wavelength can be computed explicitly. 

To show how this affects the information 
content of the picture, imagine the follow- 
ing idealized case: for long wavelengths, 
the number of gray levels is G, and for 
short wavelengths, the number of gray 
levels is proportional to the Nth power 
of wavelength, z.e., \” or 6n db per octave 
cutoff. This results in the signal spectrum 
of Fig. 1. The figure is plotted for a maxi- 
mum number of gray scales Gm of 16, 
a resolution 1/d of 100, and a cutoff rate 
n of 1, or 6 db per octave. The information 
per spot is proportional to 


log. G 


= ‘h logs (G + 1) a(L/») 


1/r 
een it log, G d(1/») 
0 


=n log. (1 — G™”) 

= lap) sfOb4 Gal, n= Ii, 
= 2.16 «for G,,.= 16, n = 2, 
=) 60% tom. G;, = 16; i, = By. 


The figures are quite insensitive to Gp. 
In order to show how much is taken away 
by the cutoff, the curve is plotted with 
a linear frequency scale as well as with 
the customary logarithmic scale. An average 
value of 2.2 bits per point seems on the 
whole more realistic than 4. 


2D. G. Fink, ‘‘Television Engineering,’’ McGraw- 
Hill Book Co., Inc., New York, N. Y., 2nd ed., 
De 24 ff.; 1952. 


IRE TRANSACTIONS ON INFORMATION THEORY 


1.5 = Th eal 


Fig. 1—Number of gray levels as a function 
of wavelength. 


Now let us look at the transmission 
link. Many types of 6-me links are con- 
ceivable, but let us choose as an example 
a link which might be designed to transmit 
commercial television with a 26-db SN 
ratio at low frequencies and a 4-me band- 
width. Assume that the channel has a 
3-db point at 4 me and that it cuts off 
thereafter at 36 db per octave. Trans- 
mission is reduced 11 db at 6 me, leaving 
an available SN ratio greatly in excess 
of the 3 db required at the cutoff of the 
picture information. (A bit of shaping, 
either pre- or postemphasis, is required to 
preserve the appearance of the picture, 
but not to preserve the information. ) 

With this channel, a line could be scanned 
at a rate 


fy 
N 


without significant loss. The number of 
scanning lines required is not 2NV/mm, as 
one might imagine, but? 2./o9N, or in 
practice, 3N. Hence the time required to 
scan a picture of area A square inches is 


mm/second 


t = (25.4)?A-(N/B)-3N 
1940 AN? 2 
a ke hee ete sal (1g 20004 N?/B, 
where B = 6-108 is the bandwidth. For 
N = 100 lines/mm, 
A = 81 square inches, 


t 262 seconds = 4.4 minutes. 
Countless examples could be generated, 
each leading to a different result. 

In conclusion, one can say that the 
estimate given in the Space Handbook is 
extremely conservative and should not be 
construed as a limit on any particular 
communication system without looking 


3 Tbid., pp. 25-26. 


Septeml 


further to see whether the assumptions 
which it is based are valid in the case 
hand. A 6-me channel which will transn 
a9 X 9-inch photograph with a resoluti 
of 100 lines/mm in four minutes is who 
practical. 

G. RaisBy 
Inst. for Defense Analys 
ARI 

Washington, D. 
J. GOLDHAMM! 
Chicago Aerial Industri 
Barrington, I 


A Note of Caution on the Squar 


Law Approximation to an Optimu: 
Detector* 


When zero-mean Gaussian noise volta 
is added to a sinusoidal signal, the envelo; 
of the sum has the well-known! probabili 
density function 


W(R; A) 


R —_ a4 2 AR 
a. ee eae ne (AB), R = 0 


0, ie 8 


in which A is the amplitude of the sign 
sinusoid, y is the mean square of the nois 
the random variable R is the amplitu: 
of the envelope and J(v) is the modific 
Bessel function of ‘order zero and args 
ment v. 

In terms of normalized quantities, (| 
becomes 


w(x; a) 


_, [ee Tame eee ( 
0, << Oc 


in which | 


z=R/Vy, a= A/Vyal 


For small signal-to-noise ratios, it 
been common in the past to approxim 
the optimum detector associated with t 
probability density function by a squa 
law detector? The optimum detee 
means here a device whose functional fo 
is such that its output represents t 
logarithm of the ratio of the approprie 
a posterior: probabilities. 


* Received by the PGIT, March 3, 1960; revis 
manuscript received, April 12, 1960, 

18S. O. Rice, “Mathematical analysis of rand 
noise,” Bell Sys. Tech. J., vol. 23, pp. 282— 
June, 1944; vol. 24, pp. 46-156, January, 1 

2D. Middleton, “Statistical criteria for the 
tection of pulsed carriers in noise (pts. I and I¥ 
Ppceee Phys., vol. 24, pp. 371-378, 379-391; Ayy 


BAVA W. Peterson, T. G. Birdsall, and W.. 
Fox, “The theory of signal detectability,” I! 
TRANS. ON INFORMATION Tueory, vol. IT-4, | 
171-212; September, 1954. 


160 


Bussgang and Middleton*:* and Blasbalg® 
ve pointed out that this square-law 
proximation can lead to certain pitfalls 
yen improperly applied. In the Russian 
erature, Flejshman,’ in a paper devoted 
the subject, gives a detailed discussion 
the same problem under what are really 
e€ same assumptions. Still, we find that 
is problem is not fully appreciated and 
at the simple square-law approximation 
often accepted without qualifications. 
appears important once again to draw 
tention to a difficulty which often arises. 
Let z be the logarithm of the ratio of 
posteriort probabilities 


w(x; ar) 


w(x; 0) (8) 


which « is the observed value of the 
velope and the amplitude a; is the one 
sen to represent the hypothesis that 
e signal is present. Substituting from 
) in (8), one gets 


== |hn 


xX 


a =o + In TG (4) 


he optimum detector in problems in- 
lving signal plus noise has a law which 
functionally of the form (4), where z 
:d z are, respectively, the normalized 
evelope at the input and the normalized 
tage at the output of the detector. 
In the region of the detector law where 
¢ instantaneous envelope is small (x < 1), 


4 J. J. Bussgang and D. Middleton, ‘Sequential 
stection of Signals in Noise,’’ Cruft Lab., Harvard 
niversity, Cambridge, Mass., Tech. Rept. No. 175; 
igust 31, 1955. 

8 J. J. Bussgang and D. Middleton, “Optimum 
quential detection of signals in noise,’ IRE Trans. 
[| INFORMATION THEORY, vol. IT-1, pp. 5-18; 
scember, 1955. 

6H. Blasbalg, ‘“‘The sequential detection of a 
\e-wave carrier of arbitrary duty factor in Gaussian 
ise,’ IRE Trans. on INFORMATION THEORY, vol. 
'-3, pp. 248-256; December, 1957. 

7B. S. Flejshman, ‘On the optimal detector 
th a log Io characteristic for the detection of a 
ak signal in the presence of noise,” Padiotekhnika 
Hlektronika, vol. 2, pp. 726-734; June, 1957. (In 
issian.) 


Correspondence 


(4) can be approximated. The problem 
hinges on keeping enough terms in the 
Taylor series expansion of the logarithm. 
For small az, one gets 


2 
el 


2 


2,2 44 
AX Ax 


7 eitaas O(arx*). (5) 


. 


It is hasty to assume from (5) that the 
optimum detector law contains just the 
first two terms even when the envelope 
zx and a, are both small. The term in 24 
must be appropriately included. To demon- 
strate this, consider H(z), the expected 
value of z. Since from (2) it can be shown 
that 


E(@’) =2+a@ 6) 


E(2*) = 8 + 8a’ + a* 


keeping the term in z‘ in (5), it follows 
that 
2) 
1 2 
ae 
2 


i 4 
ee) 


mo - 9+ 4 ( 


For small a; and a, (7) is approximately 


E® = e («? — a). (8) 


However, if only the first two terms of 
(5), up to «*, had been kept then one 
would have 


2 
ee 
d= 2 (9) 
and one would obtain 


E@’) = a ae (10) 


505 


Thus, the ‘“square-law’’ approximation 
of H(z) for a = 0 would have been taken 
improperly as 0, rather than correctly as 
—a,'/8. This error in H(z) which results 
from (9) is not merely quantitative; it 
implies that the expected output of the 
optimum detector corresponding to (4) is 
never negative, no matter how small a 
becomes. 

In the case of sequential detection? 
the magnitude of H(z) is inversely pro- 
portional to the average sample size. Thus 
a value of zero for H(z) could falsely imply 
infinite sample sizes. 

It is possible that other results obtained 
by the use of the square-law approximation 
(9) should be re-examined, for they too 
may be affected. 

One way out of the difficulty consists 
of replacing —ai‘x!/64 in (5) by its expected 
value —ai4/8 in the weak signal case. 
This replacement is equivalent to a change 
of bias, leaving the detector still a square- 
law device. Such an approach removes the 
basic difficulty*®® of reconciling (8) and 
(10). 

Even keeping the term in 24 in (5) may 
not suffice when a; is large compared to 
unity (a1 is the hypothesized voltage signal- 
to-noise ratio), even though a (the actual 
signal-to-noise ratio) itself is small. Enough 
terms in a; must be kept in (7) to evaluate 
the coefficients of a° and a?, 

Of course, if a detector were chosen to 
follow the square law represented by the 
first two terms in (5), the results questioned 
here would apply. However, such a de- 
tector is not optimum in the sense of (3). 

The square-law approximation (9) is 
so often used without attention to the 
critical influence of higher-order terms 
that we have felt it important to offer 
this note of caution. 

J. J. BussGaNne 
W. L. Mupeettr 
RCA 

Burlington, Mass. 


8 D. Middleton, ‘‘An Introduction to Statistical 
Communication Theory,’ McGraw-Hill Book Co., 
Inc., New York, N. Y., pp. 836, 876, 900; 1960. 


506 


Contributors 


A. V. Balakrishnan (S ’43—A ’55—M ’56) 
was born in Palghat, India, on December 4, 
1922. He received the Bachelor’s and 
Master’s degrees in 
physies from the Uni- 
versity of Madras, 
India, in 1945. He 
came to the United 
States in 1947 on 
a two-year Indian 
Government scholar- 
ship. In 1950 he was 
awarded the Master’s 
degree in electrical 
engineering and in 
1954, the Ph.D. de- 
gree in mathematics, 
both from the University of Southern 
California, Los Angeles. 

While doing graduate work, he was a 
laboratory assistant, teaching associate and 
lecturer at the University of Southern 
California, and an assistant instructor in 
the Mathematics Department of Yale Uni- 
versity, New Haven, Conn. 

From 1954 to 1956, he was with RCA, 
Camden, N. J., working on communication 
and control problems, including multipath 
transmission of video signals, noise cancel- 
lation systems and nonlinear filters. From 
1956 to 1957 he was an assistant professor 
of mathematics at the University of South- 
ern California, and from 1957 to 1959 at 
the University of California, Los Angeles. 
At present he is with The Space Tech- 
nology Laboratories, Los Angeles, heading 
a research group on communication and 
control theory. 

Dr. Balakrishnan is a member of Tau 
Beta Pi and Sigma Xi. 


A. BALAKRISHNAN 


Marvin Blum (M ’56) was born on 
June 18, 1928, in New York, N. Y. He 
received the B.S. degree from Brooklyn 
College, N. Y., in 
1948, and has taken 
graduate courses in 
mathematics, phys- 
ics, and electrical 
engineering at George 
Washington Univer- 
sity, Washington, 
D. C., American Uni- 
versity, Washington, 
D. C., Maryland 
University, College 
Park, Md., the Na- 
tional Bureau of 
Standards School, Denver, Colo., and the 
University of California, Los Angeles Ex- 
tension. 

He worked at the National Bureau of 
Standards in the Central Radio Propagation 
Laboratory in Washington, D. C., until 
1950. He then transferred to the Ordinance 
Division, where he conducted radar reflec- 
tion studies relating to missile proximity 
fuzes. In 1954, he was employed by Con- 


M. Bium 


IRE TRANSACTIONS ON INFORMATION THEORY 


vair, San Diego, Calif., where he conducted 
theoretical investigations in smoothing and 
prediction filters, noise simulation, and 
data reduction. He left the Convair Astro- 
nautics Division in July, 1959, to join 
the System Development Corp., Santa 
Monica, Calif., where he is working on 
prediction and smoothing filters as applied 
to space defence. 

Mr. Blum is a member of the Society 
for Applied Mathematics. 


* 


E. N. Gilbert was born in Woodhaven, 
N. Y., on July 25, 1923. He received the 
B.S. degree in physics from Queens College, 
Flushing, N. Y., in 
1943, and the Ph.D. 
degree in mathe- 
matics from the 
Massachusetts Insti- 
tute of Technology, 
Cambridge, in 1948. 
At M.I.T. he held an 
Applied Mathe- 
matics Fellowship. 

From 1944 to 1946 
he designed antennas 
at the M.I.T. Radia- 
tion Laboratory. 
Since 1948 he has been a member of the 
Mathematical Research Department of Bell 
Telephone Laboratories in Murray Hill, 
N. J., where his current main interests are 
combinatorial analysis and probability and 
their applications to problems in switch- 
ing and coding. 

Dr. Gilbert is a member of the American 
Mathematical Society. 


E. N. GILBERT 


William A. Janos (M ’59) was born on 
November 9, 1926, in Easton, Pa. He 
received the B.S. degree in physics from 
Rutgers University, 
New Brunswick, 
N. J., in 1951, and 
the M.A. and Ph.D. 
degrees in 1954, and 
1958, respectively, 
both in physics, from 
the University of 
California, Berkeley. 
He was recipient of 
the University’s ap- 
pointed teaching as- 
sistantship in the 
physies department, 
and also the Convair Scholarship Award. 

He served in the U. S. Army from 1945 
to 1947. From 1951 to 1960, he was with 
Convair and Convair-Astronautics, San 
Diego, Calif., where he engaged in applied 
analysis and spectral theory related to 
analytical dynamics, wave propagation and 
diffraction, variational techniques in least- 
time trajectories for thrust-propelled flight, 
control system analysis and synthesis, 


W. A. JANOS 


Septemt 


noise theory and optimal linear estimatic 
He is presently a staff physicist in t 
Physics Department of the Advanced L 
velopment Laboratory, Raytheon C 
Wayland, Mass. 

Dr. Janos is a member of the Amerie; 
Physical Society. 


, 
“ 


M. Vernon Johns, Jr., was born 
Berkeley, Calif., on September 27, 192 
He received the B.A. degree in economi 
from Stanford U1 
versity, Stanfor 
Calif., in 1949 a 
the Ph.D. degree 
mathematical stat 
tics from Columk 
University, Ne 
York, N. Y., in 195 
~ Since 1956 he 
been with sant 
University, first ai 
research associate 
statistics and, si 
1957, as assist 
professor of statistics. His research activ 
has been mainly in the areas of statistid 
decision theory and probability problex 
related to statistics. 

Dr. Johns is a member of the Americ! 
Mathematical Society, the Institute 
Mathematical Statistics, the Amerie: 
Statistical Association, the American / 
sociation for the Advancement of Scier 
and Sigma Xi. 


xe 


M. V. JoHNs, JR. 


W. Wesley Peterson (S ’49—A ’52— 
58) was born in Muskegon, Mich., on Ay 
22, 1924. He attended the University | 
Michigan, Ann | 
bor, receiving 7 
A.B. degree in mas 
ematics in 1948, | 
M.S.E. degree 
1950, and Ph.D. 
gree in electrical 
gineering in 1954. 

From 1951 to 1¢ 
he was a _ resea, 
associate in the J 
gineering Resea: 
Institute of the 
versity of Michi 
He was employed by the IBM Engineer 
Laboratory in Poughkeepsie, N. Y., fr 
1954 to 1956. In 1956 he joined the facu 
of the University of Florida, Gainesv# 
where he is now an associate profes 
He is currently on leave as visiting associ} 
professor of electrical engineering at 
Massachusetts Institute of Technolo 
Cambridge. | 

Dr. Peterson is a member of the Ameri 
Mathematical Society and Sigma Xi. 


W. W. PETeRson 


60 


Morris Plotkin was born in Philadelphia, 
., on February 9, 1914. He received the 
5. degree in electrical engineering in 1934, 
the M.A. degree in 
mathematics in 1951, 
and the M.S. degree 
in electrical engineer- 
ing in 1952, all from 
the University of 
Pennsylvania, Phila- 
delphia. 

Prior to World 
War IJ, he worked 
in electrical power 
engineering. Upon 
discharge from the 
Navy in 1946, he 
ned the research staff of the Moore 
hool of Electrical Engineering at the 
liversity of Pennsylvania. 

‘n 1951 he joined the Naval Air Develop- 
nt Center at Johnsville, Pa., where he 
ved as a mathematics consultant and 
ected the operation of digital and analog 
mputers. He was instrumental in the 
sign and operation of the first modern 
ge-scale simulation programs on an ana- 
; computer. Later, he supervised the 
egration of a human centrifuge and an 
alog computer into a facility for closed- 
‘p simulation of aircraft control systems. 
Since May, 1959, he has been chief of 
alysis at Auerbach Electronics Corpora- 
n. Philadelphia, Pa., where he is engaged 
the mathematical analysis of physical 
d organizational systems. His current 
bivities include the development of 
2cial techniques for radar data extraction 
d solution of queueing problems in 
rital communications networks. 

Mr. Plotkin is a member ot Sigma Xi 
d the American Mathematical Society. 


M. PLorKxin 


Richard A. Silverman (M ’54—SM ’58) 
s born on June 29, 1926, in Boston, 
ass. He received the A.B. degree from 
Harvard University, 
Cambridge, Mass., in 
1946, the M.A. de- 
gree from Columbia 
University, New 
York, N. Y., in 1948, 
and the Ph.D. degree 
from Harvard in 
1951. 

For three years he 
was associated with 
the Massachusetts 
Institute of Tech- 
nology, Cambridge, 
t as a staff member of the Lincoln 
boratory and then as a research associate 


. A. SILVERMAN 


Contributors 


in the Department of Engineering, Elec- 
trical. Currently, he is a research scientist 
at the New York University Institute of 
Mathematical Sciences, New York, in the 
Division of Electromagnetic Research. 

Dr. Silverman is a member of Phi Beta 
Kappa, Sigma Xi, and the Society for 
Industrial and Applied Mathematics. 


\7 
Od 


Thomas E. Stern (S ’54—M ’57) was 
born in New Rochelle, N. Y., on March 
29, 1930. He attended the Massachusetts 
Institute of Tech- 
nology, Cambridge, 
where he received his 
undergraduate edu- 
cation under the co- 
operative program in 
electrical engineer- 
ing. After receiving 
the B.S. and MS. 
degrees in 1953, he 
became a research as- 
sistant at the M.I.T. 
Research Laboratory 
of Electronics, and in 
1956 received the Sc.D. degree from M.I.T. 

At present, he is an assistant professor of 
electrical engineering at Columbia Univer- 
sity, New York, N. Y. His areas of research 
include analog computation, nonlinear net- 
work theory and information theory. 

Dr. Stern is a member of Eta Kappa 
Nu and Sigma Xi. 


“ite E. GSS 


John B. Thomas (S ’52—M ’56) was 
born in New Kensington, Pa., on July 14, 
1925. He received the A.B. degree from 
Gettysburg College, 
Gettysburg, Pa., in 
1944, the B.S. degree 
from The Johns Hop- 
kins University, Bal- 
timore, Md., in 1952, 
and the M.S. and 
Ph.D. degrees from 
Stanford University, 
Stanford, Calif., in 
1953 and 1955, re- 
spectively. 

During World War 
II he served in the 
U. S. Army. From 1946 to 1952 he was 
employed by Koppers Co., Inc., Baltimore, 
Md., in the Industrial Gas Cleaning De- 
partment, first as an electrical engineer and 
then as assistant chief engineer of the 


J. B. THomas 


507 


department. Since 1955 he has been a 
member of the Department of Elec- 
trical Engineering at Princeton University, 
Princeton, N. J., and is currently an 
associate professor in that department. 

Dr. Thomas is a member of the American 
Institute of Electrical Engineers, Tau Beta 
Pi, and Sigma Xi. 


2, 
o 


Eugene Wong (S ’57) was born in Nan- 
king, China, on December 24, 1934. He 
received the B.S.E., A.M., and Ph.D. de- 
grees from Princeton 
University, Prince- 
tony Nee Je eto D oy 
1958, and 1959, re- 
spectively. From 
1955 to 1956 he was 
employed by I.B.M., 
Poughkeepsie, N. Y., 
where he engaged in 
semiconductor re- 
search. For the year 
1959-1960, he is a 
National Science 
Foundation post- 
doctoral fellow at the University of Cam- 
bridge, Cambridge, England. 

Dr. Wong is a member of Sigma Xi. 


E. WonG 


Neal Zierler was born on September 17, 
1926, in Baltimore, Md. He received the 
A.B. degree in physics at the Johns Hopkins 
University, Balti- 
more, in 1944. After 
service in the Navy, 
he returned to Johns 
Hopkins for gradu- 
ate work in physics 
and then attended 
Harvard, University, 
Cambridge, Mass., 
where he received the 
M.A. degree in 
mathematics in 1949 
and the Ph.D. degree 
in 1959. 

He worked at the Ballistic Research 
Laboratories, Army Ordnance, Aberdeen, 
Md., in 1951, and at the Instrumentation 
Laboratory of Massachusetts Institute of 
Technology from 1952 to 1953. Since April, 
1954, he has been a member of the staff of 
M. I. T.’s Lincoln Laboratory, at Lexing- 
ton, Mass. 

Dr. Zierler is a member of the American 
Mathematical Society and the Mathemati- 
cal Association of America. 


N. ZreRLER 


508 


Book Reviews 


Information Theory and Statistics—S. Kullback. (John Wiley and 
Sons, Inc., New York, N. Y., 1959; and Chapman and Hall, Ltd., 
London, England; 1959. 395 pages + xvii pages. $12.50.) 

This book is about statistics; in particular,:the testing of statis- 
tical hypotheses. It is not concerned with the sort of applications 
of information theory to communications problems that constitute 
the major part of these Transactions. Since, however, statistics 
has to do with gathering and using information (in the wide sense), 
and since “information theory’’ (in the Shannon sense) purports 
to be a quantitative theory of information and its handling, one 
is tempted to suppose that statistics could be given an information 
theoretic basis. This book represents a more or less unified account 
of the author’s pioneering efforts in this direction. 

Starting with a few fairly simple postulates of an information 
theoretic nature, but quite ad hoc from the statistician’s point of 
view, the author derives an impressive array of old and new results 
in statistical hypothesis testing and related ideas. Most of the 
material is derived from two basic notions: 1) the directed diver- 
gence from one simple hypothesis to another and 2) the directed 
divergence from a sample to a hypothesis. Detailed definitions of 
these concepts will be found in the first five chapters of the book. 

By using the tools mentioned above, the author gives, in Chapters 
6-13 (comprising 244 pages), detailed and often numerical dis- 
cussion of examples in multinomial and Poisson populations, con- 
tingency tables and multivariate normal populations. Substantial 
problem lists appear at the end of each chapter. Referencing is 
extremely thorough in all sections, including problem sections. 

In criticism, the reviewer remarks that the reader is unnecessarily 
exposed to much secondary material before the author’s main 
thesis is reached in Chapter 5; the definition of critical region on 
page 85 is not clear; in several places, discussion of tests based on 
the symmetrized divergence could well have been omitted. These 
are minor faults, however. 

The author states in the Preface, ‘‘Applications to more general 
stochastic processes, including sequential analysis will make a 
natural sequel, but are outside the scope of this book.” 


8. P. Luoyp 
Bell Telephone Labs. 
Murray Hill, N. J. 


The Theory of Optimum Noise Immunity—V. A. Koteln’ikov 
(translated from the Russian by R. A. Silverman). (McGraw-Hill 
Book Co., Inc., New York, N. Y.; 1959. 140 pages + xi pages. 
Illus. 74 x 104. $7.50.) 

Vladimir Aleksandrovich Kotel’nikov is one of the few electronics 
engineers ever to be elected Academician in the Academy of Sciences 
of the U. S. S. R. In 1947 he published a doctoral dissertation 
which was subsequently republished in book form in Russian. 
R. A. Silverman’s excellent translation of this Russian classic 
should be of interest to English speaking communications engineers 
for both technical and historical reasons. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septemb 


Kotel’nikov deals with the problem of analyzing the performan 
of various communication systems in the presence of additi 
Gaussian noise. This performance is measured by minimum pro 
ability of error in the case of discrete messages and minimu 
mean-square error in the continuous case. The organization of tk 
little book is as neat and concise as Kotel’nikov’s mathematic 
Part I serves to set the framework of the problems considered a1 
to sharpen the few elementry mathematical tools used in the rest 
the book. These tools consist of : 1) the expansion of signals defins 
in the interval[—7'/2, 7'/2] in terms of a set of orthonormal function 
2) the use of this expansion to derive the geometrical properti 
of a signal space, and 3) the corresponding expansion of shot noi 
and the statistical properties of such noise in signal space. Part - 
then uses these tools to treat the transmission of discrete messag 
as an hypothesis testing problem. Part III deals with the analys 
of the transmission of a continuous parameter value treated as 
statistical estimation problem. In terms of what we now knoy 
much of Part III may be generalized. This does not, howeve 
detract from the usefulness of Kotel’nikoy’s analysis. Finall 
Part IV deals with the extension of the techniques of statistic 
estimation to cover the analysis of the transmission of continuot 
time functions. Most of the results in Part IV are restricted to tl 
case when the signal-to-noise ratio is high. 

Each of the last three parts begins with a general analysis of tl 
problem and a derivation of the optimum method of detectio 
After this, the author presents a large number of examples of pra 
tical communication systems to illustrate the theory. In son 
cases, the examples are compared with a suboptimum method _ 
detection. Each part is concluded with a discussion of the interpr 
tation of the results in terms of the geometrical properties of sign 
space. 
ae a time when the space lag is exciting so much comment, 
is interesting to examine this volume with a view to the probak 
existance of a ‘‘communication theory lag.’’ The framework with: 
which Kotel’nikoy solves his problems is one which we now cz 
statistical decision theory. Statistical decision theory was develope 
in this country by Wald shortly before Kotel’nikoy first publishe 
his results in 1947. It was eight years later, however, before Middl 
ton and Van Meter became the first to apply Wald’s ideas to cor 
munication problems in this country. 

Two other Sputniks of note in this volume are the geometric 
interpretation of signals and the analysis of nonwhite noise filterix 
problems by the use of inverse filters. The former was first publisha 
in this country by Shannon in 1949, the latter by Bode and Shanne 
in 1950. 

It seems fair to say that at the present time there is no sing: 
major concept (with the possible exception of the material in Sectic 
9-2) in Kotel’nikov’s work which is not known in this countr 
As noted above, this was not true as little as five years ago. TH 
is not to say that the results have all been worked out in this cou, 
try—they have not. } 
Norman M. Asrams¢ 

Dept. of Elec. Engr 

Stanford Universi) 
Stanford, om 


| 


R. A. Epstein 
Jet Propulsion Lab. 
Pasadena, Calif. 


8. V. C. Aiya 
Indian Institute of Science 
Bangalore 12, India 


Canada 


D. A.. Bell 
University of Birmingham 
Birmingham, England 


IRE TRANSACTIONS ON INFORMATION THEORY 


I. Cederbaum 
Ministry of Defence 
Box 7063, Hakirya 
Tel Aviv, Israel 


509 


This Section of the issue is devoted to abstracts of material which may be of interest to PGIT members. Sources 
are Government, Industrial and University reports, and books and journals published outside of the United States. 
Readers familiar with material of this nature which is suitable for abstracting are requested to communicate the 
pertinent information to one of the Editors or Correspondents listed below. 


Editors 


G. L. Turin 
Hughes Research Labs. 
Malibu, Calif. 


Correspondents 


L. L. Campbell 
Essex College 
Windsor, Ontario 


W. Meyer-Eppler 
Universitat Bonn 
Bonn, Germany 


H. Mine 

Defense Academy 
Obaradai, Yokosuka 
Japan 


G. Francini 

Pe Sebel 

Viale di Trastavere, 189 
Rome, Italy 


omb Filters Using Delay Elements for Periodic Signal Detection— 
. Aoyagi and T. Kasami (in Japanese). (J. Inst. Elec. Commun. 
mgrs. Japan, vol. 43, pp. 32-38; January, 1960.) 


This paper is concerned with optimum comb filters for the im- 
rovement of the SN Ratio of a repeating signal corrupted by 
oise. An optimum comb filter is defined as a filter consisting of 
given number of delay elements having a delay time equal to 
1e signal period, 7, and several adders, multipliers and phase 
iverters, such that a maximum SN Ratio improvement is obtained, 
ibject to the stability condition. 

A comb filter containing N delay elements has a transfer function 
the form 7T(s) = T(z) = f(27)/g(z“), where z = e’". The follow- 
g assumptions are made: 1) the stability requirement can be 
pressed as a limitation on the locations, pz, of the poles of Ts), 
z., Ke pp < —6 = log a < 0, where a depends on the accuracy 
id stability of the elements; and 2) the noise is additive and 
correlated with the signal, and its power spectrum may be con- 
Jered to be flat in intervals 2rn/7 < w < 2a(n + 1)/r7, n an 
teger. 

It is proved that the signal-to-noise power ratio improvement 
etor, G, is given by G = 7 (1)/the sum of the residues of the 
les of T(z)T (z2+)z inside the unit circle |z| = 1. For a given 
and a, a simple method to obtain the optimum transfer function 
id its corresponding G is found. 

The modification for the case of a finite repetition of the signal 
discussed, and it is shown that if N is very much less than the 
mber of repetitions of the signal, results similar to those for 
e ideal case hold. The last section is devoted to a discussion of 
= deviation from the ideal case due to inaccurate delay times, 
d a simple evaluation formula for the deviation is derived. 


inciples of Sorting—D. A. Bell (in English). (Computer J., vol. 
pp. 71-77; July, 1958.) 


The entropy measure of disorder, and the reduction of entropy 
use of information, are shown to be applicable to the task of 
ting items from random sequence to a specified sequence. In 
rticular, the number of operations in sorting by merging illus- 
tes the relation between number of binary selections and _re- 
ection of entropy. In sorting on electronic computers, one may 
h to exchange storage requirement against number of operations. 


Spectrum of Television Signals—D. A. Bell and G. E. D. Swann 
(in English). (Wireless Engr., vol. 33, pp. 253-256; November, 1956.) 


Analysis of video signals from the BBC transmitter at Sutton 
Coldfield showed that sidebands are grouped round the harmonics 
of line frequency with 50-cps spacings, but since successive line 
harmonics differ by a multiple of 25 eps, the fringes of these groups 
overlap so as to give components at 25-cps spacing in the center 
of each interline gap. The amplitude of video component falls 
rather faster than as 1/f. Data are given for the statistical distri- 
bution of instantaneous amplitudes at particular video frequencies. 


The Rate of Transmission of Information in Pulse-Code Modulation 
Systems—A. R. Billings (in English). (Proc. JHE, pt. C, vol. 
105, pp. 444-447; September, 1958.) 


By calculating the information communicated through a noisy 
PCM channel (taking account of the remaining equivocation), it 
is deduced that the rate could be 70 per cent — 80 per cent of 
channel capacity, though with low error rates it is less than 33 
per cent. It is suggested that error-correcting codes be devised to 
allow working in the condition of high equivocation of individual 
digits. 


Communication Efficiency of Vocoders—A. R. Billings (in English). 
(Electronic and Radio Engr., vol. 36, pp. 449-453; December, 1959.) 


Spoken English of high intelligibility is taken to be equivalent 
to printed English and hence Shannon’s work (‘Prediction and 
Entropy of Printed English’’) can be used to infer that the com- 
munication rate of speech is approximately 20 bits/sec. ASN Ratio 
of 24 db is postulated for high intelligibility, and compared with 
the Shannon channel capacity, a vocoder has an efficiency of the 
order of 1 per cent. Efficiency is improved by introducing volume 
limiting so that the intensities of signals indicating the presence of 
various formants are independent of the formant intensities. In 
the simple ‘low-power vocoder,” the speech is modulated on to 
a 10-ke carrier which is then limited before spectral analysis. In the 
“tertiary low-power vocoder,’’ separate AGC amplifiers are pro- 
vided in each of the three frequency channels which are expected 
to receive high-intensity formants. 


510 IRE TRANSACTIONS ON INFORMATION THEORY Septemb 


Radar Range Performance—D. L. Drukey and L. C. Levitt. * 
(Hughes Aircraft Co., Culver City, Calif., Technical Memorandum 
560, August 1, 1957; revision of Technical Memorandum 277, 
April 15, 1954.) 


This report treats the case of a pulse radar system in which 
several echoes are obtained from the object as the radar beam scans 
over it. The best method of processing the receiver output so as 
to distinguish signal plus noise from noise alone is determined. 
Various types of fading echo signals, as well as constant ones, 
are considered. Expressions are obtained which permit calculation 
of the probability of detecting the target as a function of range 
and the radar parameters. Several nonoptimum processing schemes 
are also investigated, as are schemes which make use of variable 
radar parameters to obtain better performance. The question of 
choosing radar parameters is also studied briefly. Appendixes 
indicate in considerable detail the derivations underlying the results 
presented. 


Noise-Reducing Codes for Pulse-Code Modulation—J. E. Flood 
(in English). (Proc. IEE, pt. C, vol. 105, pp. 391-397; September, 
1958.) 


To overcome the weighting by significance of digit of the noise 
associated with error in a pulse in binary PCM, it is proposed to 
reduce the error-risk on the more significant digits by using more 
than one pulse per digit. The effect of noise is computed for a 
range from 1 to 10 pulses per digit. Alternatively, longer pulses 
could be used for the more significant digits. Since the bandwidth 
must be increased, there is more noise and no over-all improvement. 


Comma-Free Codes—S. W. Golomb, B. Gordon, and L. R. Welch 
(in English). (Can. J. Math., vol. 10, no. 2, pp. 202-209; 1958.) 


This study concerns codes which can be deciphered work by 
word without the whole message being available. Consider an 
alphabet consisting of the numbers 1, 2, --: , n. A set of k-letter 
words is called a comma-free dictionary if whenever (aid2 +-- ax) 
and (bibs --: b,) are in the dictionary the ‘‘overlaps’’ (a2a3 -++ axb1), 
(a344 +++ axbibe), etc., are not in the dictionary. An upper bound 
is obtained for the number of words in the dictionary. It is shown 
that this upper bound can be attained for odd k < 15, and it is 
conjectured that the upper bound can be attained for arbitrary 
odd integers k. Some asymptotic results for large n are also obtained. 
For odd k and n —> ~, the authors show that the number of words 
in the dictionary is approximately k—n* (as compared with n* 
possible words when there is no restriction on overlapping). 


The Effect of Random Noise Background upon the Detection of a 
Random Signal—H. 8. Heaps (in English). (Can. J. Phys., vol. 
33, pp. 1-10; January, 1955.) 


A noise distributed in phase and power according to a Rayleigh 
law is studied in terms of its effect upon the detectability of a 
signal of similar phase and amplitude distribution. An expression 
is derived for the probability distribution of the ratio of the power 
of the signal plus noise to that of the noise in the absence of the 
signal. The corresponding result is given for the ratio of the averages 
over several observations. Also derived is the probability distri- 
bution of the fractional change in noise plus signal power due to 
a given fractional change in signal power. 


The Effect of a Random Noise Background upon the Detection 
of a Sinusoidal Signal—H. S. Heaps (in English). (Can. J. Phys., 
vol. 33, pp. 509-520; August, 1955.) 


A sinusoidal signal is examined after reception upon a noise 
background of random phase and power distributed statistically 
according to a Rayleigh law. An expression is obtained for the 
probability distribution of the ratio of the power of the signal 
plus noise at any instant to the power of the noise which would 
be received at that instant in the absence of the signal. The corre- 
sponding result is given for the ratio of the averages over several 
observations. The equations contain as a parameter the ratio of 
the signal power to the mean noise power. When each observed 
value of the power is the average over a small time interval, the 
formulas are applicable provided the noise and the signal have 
the same frequency. A similar analysis is presented to deal with 
the case in which the noise and the signal have different frequencies 
and in which each observation is the average over a small time 
interval. Comparison with the results of a previous paper, in which 


the signal was assumed to have a Rayleigh distribution in phe 
and power, indicates the effect of extreme fluctuation of an ori 
nally sinusoidal upon its resultant with a random noise backgroun 


Optimum Filter Functions for the Detection of Pulsed Signe 
in Noise—H. S. Heaps (in English). (Can. J. Phys., vol. 36, p 
692-703; June 1958.) 


This paper is concerned with the optimum method of processi 
a signal received upon a background of noise. Determination 
made of the transfer function of the process that maximizes t! 
ratio of the average power of n successive samples of the outp 
signal to the mean output noise power. For sufficiently large valu 
of n, the ratio is a close approximation to the ratio of the sige 
to-noise energy contained in a sample of the output over a fin 
time. The maximum ratio is calculated when the input is a 
tangular de pulse upon white noise and when it is a cosine pu 
upon nonwhite noise. 

The transfer function to maximize the ratio of signal volta 
to noise power was determined in a previous paper. It is fou 
that for the rectangular pulse the two methods of optimizati 
lead to very similar ratios of signal-to-noise energy and that t 
third-order low-pass Butterworth filter produces a ratio of sign: 
to-noise energy that lies within a few per cent of the asec | 
maximum. Such is not the case for the cosine pulse. 


H. S. Heaps and M. R. McKay (in English). (Proc. IEE, pt. 
vol. 105, pp. 438-443; September, 1958.) | 


In contrast to the classical ‘detection’ principle of produci 
a maximum value of SN Ratio at a single instant, it is often prefs 
able to take the average of a number of samples or to take t 
average over a continuous interval. Filters are specified for the 
taks and for inputs consisting of: 1) rectangular pulse upon we 
noise, and 2) cosine pulse upon nonwhite noise. 


Optimum Network Functions for the Sampling of Signals in Noise 
| 
} 


On the Interpolation and Prediction of Signals plus Noise 4 
Infinite and Finite Smoothing Times—D. McDonnell and R. 
Perkins (in English). (Proc. IEE, pt. C, vol. 106, pp. 47-54; Marc 
1959.) 


The weighting function h,(t) which represents the operation 
the desired filter is separated into parts, h(t) which is free fre 
impulses and h2(t) containing all the impulses. The resulting fon 
of the Wiener-Hopf equation is then solved by the Laplace tran 
form and two examples are worked for the usual requirement 
minimizing the mean-square error, averaged over infinite tin 
Section III deals with the alternative requirement to minimize t 
ensemble average of the error at a specified time after the applia 
tion of signal to the circuit. 


An Emphasis Scheme which is Information Theoretically Match 
to a Continuous Communication Channel—H. Miyakawa 
Japanese). (J. Inst. Elec. Commun. Engrs. Japan, vol. 42, : 
1220-1226; December, 1959.) 


It is well established that the apparent noise spectrum of 
continuous communication channel can be transformed by) | 
emphasis scheme. However, the ordinary emphasis scheme is ¥ 
inefficient from the information theoretical point of view. T! 

paper presents a new emphasis scheme which is matched to 
channel infor mation-theoretically. 

The message signal, which is assumed to be white, is samp] 
at Nyquist intervals and its sample values are supplied to a clas 
cal emphasis circuit, whose characteristics may be written as 


S@) =1+ > ene 


where 7 is the Nyquist interval. The power of the emphasiz 
sample value is greater than that of the original message by 


10 logio (1 -+ > *); 


so the emphasized sample values cannot be sent into a chai 
when channel power is limited. This is the reason that the classi 
emphasis scheme is inefficient. In the new emphasis scheme, | 


| 


160 


yphasized sample values are further processed by a“ sawtooth”- 
pe nonlinear circuit. The inverse nonlinear circuit is placed in 
ie de-emphasis system of the receiver. 

It is shown that the new emphasis scheme enables one to trans- 
) a channel with a given noise spectrum into a channel with 
_ arbitrary noise spectrum, within the limitations of information 
eory, even if the channel power is limited. 


4 


gical Elements Based on a Majority Decision Principle and the 
mnplexity of their Circuits—S. Muroga (in Japanese). (J. Inst. 
‘ect. Commun. Engrs. Japan, vol. 42, pp. 993-1000; November, 
59.) 


A logical element based on a majority decision principle is defined 
re as an organ such that a majority of the binary values of its 
put decides a binary value of its output. Theoretical aspects 
logical elements of this sort were discussed by McCulloch and 
tts in 1943 and later by von Neumann in 1959, where neurons 
sre models in both cases. This paper does not overlap with these 
pers. That is, the introduction of a concept of unequal input 
upling amplitudes gives a new aspect of this theory which may 
+ of importance in engineering applications. The parametron 
orks exactly on this principle and we may also be able to construct 
1 element of this sort with other components like diodes. 

tn this paper, after the description of a mathematical model 
r the element, with definitions of a threshold and a total coupling 
mber of the element, it is shown that when unequal coupling 
nplitudes of integral values are assigned to the inputs, the element 
in represent various functions, not only symmetrical but also 
syrametrical, according to the combination of values of coupling 
aplitudes. Though the number of these functions is limited for 
specified number of input variables, even a single element based 
1 the majority decision principle can represent a fairly complex 
action. A theorem which specifies functional forms which can 
s represented by a single element is given. 

Consequently, synthesis of a switching circuit for a given function 
ith such elements has economical advantages because of the few 
ements required and the high speed of operations due to the 
nall number of cascaded elements required from the input to 
ie output. The discussion of synthesis is divided into two cases, 
cording to the restriction of the maximum number of inputs 
hich are coupled to any element in the circuit. In the first case, 
here there is no restriction, the required number of elements is 
eatly reduced to construct the circuit for a given function. In 
ie second case, where the maximum number of inputs to any 
ement in the circuit is specified as a certain value, for example, 
ye in this paper, the synthesis still requires less elements than 
e required number of relay contacts which was shown by Shannon. 
ast, it is shown that unique features of the majority decision 
inciple are indicated in the synthesis of a circuit for a symmetrical 
nection or a partially symmetrical function, requiring extremely 
w elements. 


Statistical Study of Fading, Diversity Effects and the Improve- 
ent Characteristics of Diversity Receiving Systems—M. Nakagami 
1 Japanese). (Shukyosha Co., Kobe, Japan; 1947.) 


This book summarizes the principal results of a series of the 
ithor’s statistical studies, performed during the years from 1935 
1943, on fading, diversity effects and the improvement available 
ym. a diversity system. The contents are divided into two parts: 
perimental studies (Chapters 1 to 5) and theoretical studies 
hapters 6 to 11). 
In Chapter 1, the necessity for a statistical consideration of 
ding is emphasized, and some methods of treating the character- 
ies of fading and diversity effects, as well as the improvement 
aracteristics available from diversity reception, are presented. 
Chapter 2 are discussed the new photometric and some other 
aple methods of observing fading and diversity effects statistically. 
me statistical properties of fading and diversity effects, observed 
‘the above method, are shown. In Chapter 3, the author describes 
e characteristics of space, frequency and polarization diversity 
ects observed in the HF band with various propagation distances, 
ections and frequencies. In Chapter 4, the observed ‘coherent 
aracter of HF waves is presented. From these, the writer suggests 
> structure of the wave coming from various distance more than 
00 km. In Chapter 5, the characteristics of the improvements 
tained in space, frequency, and polarization diversity systems 


Abstracts 


oll 


are shown and compared with each other. In Chapter 6, the author 
presents a mathematical method of treating fading and diversity 
effects. In Chapter 7 are established general theories of the inter- 
ference, attenuation and polarization types of fading as well as of 
mixed types of fading. The observed intensity distribution is com- 
pared with that theoretically derived. In Chapter 8, general theories 
of coherence and the space diversity effects are discussed in detail. 
The influence of the correlation, due to the wave picked up by 
feeder lines, on the magnitude of the space diversity effect observed 
in the input of the receivers, is discussed. In Chapter 9, similar 
theories of polarization diversity are developed. In Chapter 10, 
the writer establishes a general theory which enables one to estimate 
the improvement available from n-diversity combination in various 
systems, such as linear addition, switching, etc. In Chapter 11, 
the improvement characteristics of space and polarization systems 
used for long distance HF communication are discussed in detail. 
The essential points described in this book are summarized in the 
last chapter. 


The m-Distribution as the General Formula of Intensity Distribution 
of Rapid Fading—M. Nakagami, K. Tanaka, and M. Kanehisa 
(in English). (Memoirs of the Faculty of Engineering, Kobe University, 
Japan, no. 4; March, 1957.) 

This paper summarizes the principal results of a series of the 
authors’ statistical studies in the last five years on fading character- 
istics, especially on the intensity distributions due to rapid fading. 

The method of derivation and the important characteristics of 
the m distribution (a form of the gamma distribution), originally 
found in the author’s experiments and designated by them, are 
described. The applicability of this formula to both ionospheric 
and tropospheric modes of fading is well confirmed by some experi- 
mental results. Its theoretical background is also discussed in 
detail. A theoretical interpretation of the log-normal distribution 
is given on the basis of this formula. An extremely simplified method 
of estimating the improvement characteristic of various systems of 
diversity reception is presented. The mutual dependences between 
the m formula and other basic distributions are fully discussed. 
Some generalized forms of the basic distributions and their de- 
pendences on the m form are also investigated. Two methods of 
approximating a given function with the m distribution are also 
shown. 

The joint distributions of two variables following the m dis- 
tribution are derived. Using this, a unified theory of diversity 
effects is established. These theories have enabled the authors to 
estimate the improvement characteristics due to diversity reception 
for two correlated signals having various fading ranges. 

An entirely new shorthand method of observation of fading 
is proposed based on the theory of the m distribution. Using this 
apparatus, the authors are now obtaining much valuable informa- 
tion on fading. 


An Analysis of Non-Mathematical Data Processing—H. A. Newman 
(in English). (In ‘“Mechanization of Thought Processes,’ Her 
Majesty’s Stationery Office, London, England, p. 865; 1959.) 


A great deal of data-processing involves the recognition of pattern 
and the judgment whether patterns are alike. Patterns can be 
built up from elementary marks, but the number of arrangements 
which can be constructed from a given set of elements of reason- 
able extent is so large that matching against an exhaustive set is 
impracticable. It is important to use the regularities of the patterns 
that arise in practice, and to have readily available in store common 
subgroups from which the larger patterns can be constructed. The 
store organization and structure should be modified with time as 
experience shows which of the whole range of patterns are the most 
probable. If a machine cannot find from its store patterns to match 
those being presented to it, it must create new patterns to try, 
and to do this it must be noisy. 


A New Spectrum Computer ‘‘Meriac-l-F” for the Analysis of 
Recorded-Curve Data—M. Nishida and T. Furuhata (in Japanese). 
(J. Inst. Elec. Commun. Engrs. Japan, vol. 42, pp. 1045-1050; 
November, 1959.) 

In this paper the outline of the data-processing machine 
“Meriac-1-F”’ is described and some analytical results are shown. 

This machine has been designed and constructed to reduce 
manual efforts and obtain calculated results speedily, and function- 
ally consists of two sets of apparatus: the input device “Meriac-1- 


512 


F-100” and the output device “Meriac-1-F-200.” The former is’ 


a kind of information-transforming device which automatically 
reads the values from a curve of complicated data on a given chart 
at high speed without missing any physical information and records 
them on a six-unit binary digital tape. 

The output device is an analog computer for Fourier analysis 
which automatically reads the given binary digital tape at ultra- 
high speed and sucessively records at high speed the amplitude of 
the wave for each component frequency. 


Modulation by Random and Pseudo-Random Sequences—R. C. 
Titsworth and L. R. Welch (in English). (Jet Propulsion Lab., 
Pasadena, Calif., Progress Rept. 20-387; June 12, 1959.) 


This report deals with the modulation of signals by discrete 
random and random-like sequences which may change state only 
at integral multiples of some basic time division f. The signals 
may be modulated (sampled) in many fashions, depending mainly 
upon the types of sequences and signals available, the desired 
output phenomena, and the sequential rate. 

In general, a sequence may sample a set of signals at random, 
or it may sample in some fixed deterministic fashion. Furthermore, 
deterministic processes may be constructed to possess certain 
random-like qualities. Special attention is given to random Markov 
chains and linear pseudo-random sequences; the signals selected 
for modulation are not restricted to any one class, and examples 
are given for sinusoids and square waves. 

Specifically, the effects of carrier-signal waveform and type of 
sequence upon the over-all power spectrum are considered. In the 
case of sinusoidal modulation, the effect of phase shift is investigated. 


Conditional Probability Computing in a Nervous System—A. M. 
Uttley (in English). (In ‘“Mechanization of Thought Processes,” 
Her Majesty’s Stationery Office, London, England, p. 119; 1959.) 


The author examines the hypothesis that the organization of 
nervous systems is based on the two principles of classification 
and conditional probability. Given a system of binary inputs, the 
facility of connecting the inputs in nearly all possible ways, and 
the availability of delays, it is possible to construct a network 
having separate indicators for each of a number of spatial and 
temporal patterns. This is a classification network, but it becomes 
a conditional-probability system if in addition, the presentation 
of any one pattern causes the computation of the conditional 
probability of every other pattern. The building up of records of 
conditional probabilities, or ‘earning,’’ implies some mechanism 
ot slow change in the state of a neuron. Electric-circuit models are 
suggested which exhibit slow recoveries analogous to the spon- 
taneous recovery of conditioned reflexes. 


On the Distribution of the Product of Diode Detector Waveforms— 
E. L. R. Webb (in English). (Can. J. Phys., vol. 34, pp. 679-691; 
July, 1956.) 


The probability distribution of the product of two waveforms 
such as come from the diode second detectors of radio receivers 
is examined over the whole range of SN/R’s. Computed curves of 
probability density are given for small and moderate values of 
SN Ratio and the limiting form for large signal-to-noise indicated. 
The pure noise case is the only one immediately available in terms 
of tabulated functions. Compared to the Rayleigh distribution 
it rises much faster, reaches its maximum sooner and lower, and 
decays much more slowly. The very large SN Ratio case approaches 
an impulse function. Estimates of mean and variance are given. 


A Game Theoretic Model of Communication Jamming—L. R. 
Welch (in English). (Jet Propulsion Lab., Pasadena, Calif., Memo. 
20-155; April 4, 1958.) 


The communication jamming problem is analyzed from a game- 
theoretic viewpoint. A model is described in which both the com- 
municator and jammer transmit real numbers at equally spaced 
intervals which are taken to be the unit of time. Both parties have 
power limitations, and the jammer has the entire past history of 
the communicator’s signal available for analysis. 

It is shown that, if the signal power is 1 and the jamming power 
is J, the communicator can transmit at an information rate ar- 
bitrarily close to 4 logs (1 + J)/J bits per unit time with an 
arbitrarily small probability of error of a message block. Further- 
more, the jammer can prevent the communicator from diong any 
better than this. 


IRE TRANSACTIONS ON INFORMATION THEORY 


Septeml 


A Class of Definitions of ‘‘Duration” (or ‘‘Uncertainty’’) and t 
Associated Uncertainty Relations—M. Zakai (in English). (Mer 
randum of the Scientific Department, Ministry of Defence, Isra 
September 14, 1959.) 

A new class of definitions for ‘time duration’? and “bandwidt 
(or ‘time uncertainty’? and “frequency uncertainty’’), in terms 
norms of L? spaces, is suggested. Some properties of the definitic 
and the associated uncertainty relations are derived. As examp. 
of the application of these concepts, the problem of the approa 
of the probability distribution of shot noise towards the norm 
law and the “beamwidth’—“aperture width” product in anten 
theory are considered. 


The following two volumes are collections of papers published 
the A. S. Popov Scientific-Technical Society for Radio Engineeri 
and Electrical Communication, Moscow, USSR. They were edited 
V. 1. Siforov. The tables of contents are given in full below, but becaa 
of space considerations, abstracts are given in only a few cases. 
papers are in Russian. 


Issue No. 2—1958 


On the Determination of General Technical Characteristics 
Communication Systems—A. G. Zyuko. 


On the Determination of the Amount of Information I(n, é) 
a Random Object n concerning a Random Object ~ in the Case 
Continuous Transmission—G. B. Linkovsky. 


| 


On the Theory of an Ideal Receiver in the Sense of Kotelniko 
B. A. Varshaver. 


Basic Relations in Radio Receiving Systems Employing Integratil 
and Filtration of Signals in the Presence of Fluctuation Nois 
N. L. Teplov. 


On the Theory of Radiocommunication Channels with Multipg 
Propagation—V. I. Siforoy. 


| 


Estimation of the Largest Possible Value of the Entropy of 
Unknown Distribution Characterized by Several Moments—B. . 
Fleishman and G. B. Linkovsky. 


Estimation of the Entropy and Distribution Function of a Sca 
Random Variable Characterized by Several Sample Moments 
G. B. Linkovsky. 


Several Questions in the Theory of Construction of Error Correcti 
Codes—L. F. Borodin. 


Several general methods for the construction of error-correcti 
codes are evolved through the use of the concepts and techniqu 
of number theory. Specifically, methods for the detection a 
correction of errors are developed, and a single error: coma 
code is considered in detail. For the case where the number | 
levels, a, is a prime number, a systematic procedure is given / 
the construction of a number of special codes (e.g., optimal eq 
for an erasure channel; a code in which all sequences differ fr 
one another in exactly d = (a — 1)a™— positions; a single-err} 
correcting code). Finally, two circuits for adders mod a are a | 


Accumulation of Disturbances and Fading in Main Radio Re} 
Lines—V. I. Siforov. | 


Cross-Talk Noise Arising in FM Radio Relay Lines Due to Multipa 
Propagation or Mismatch and Non-Uniformity in Antenna Feeders 
A. V. Prosin. 


On the Analysis of Multichannel Communication Systems wi 
FM and Bandwidth Compression—A. VY. Prosin. 


Approximate Analysis and Modelling of Accumulation of D 
turbances in Radio Relay Communication Links—Yu. B. Sindl 
Parametric Methods in Electrodynamics—L. A. Druzhkin. 

On the Question of Choice of the Form of the Vector-Parameti 
Equation in the Solution of Problems of Distribution of Ch 
on Closed, Cylindrical, Infinite Conductors, and on Linear, Pl 
Closed Conductors—L. A. Druzhkin. 


160 
Issue No. 8—1959 


I. Theory of Communication Channels with 
Randomly Varying Characteristics. 


jtimal Reception of a Parameter Transmitted over a Channel 
th Additive, Multiplicative and Phase Disturbances—V. I. 
orov, B. 8. Fleishman, and G. B. Linkovsky. 


Optimal reception of a parameter transmitted over a channel 
th additive Gaussian noise was first considered by V. A. 
otelnikov for the case of a single parameter, and by D. Slepian for 
}€ case of many parameters. In recent years, the growing interest in 
ansmission of metereological and, particularly, geophysical param- 
ers, frequently under conditions which have not been explored 
ficiently (for example, in connection with satellites and high-alti- 
de rockets), suggests the need for the development of a theory of 
ise immunity for more general types of channels. 

It appears that a natural generalization of a continuous channel 
‘provided by the recent work of V. I. Siforov on channels with 
ndomly varying characteristics. Such channels are found in 
nection with tropospheric and ionospheric propagation at UHF 
well as in ordinary short-wave propagation. In all these cases, 
ultipath propagation of radio waves takes place, with attendant 
mdom modulation of transmitted signals both in amplitude and 
tase in the presence of internal noises in the receiver. This gives 
se to channel noise which has multiplicative, additive and time- 
jay components. Previous investigations of channels of this type 
ere information-theoretic in nature and were concerned with 
e estimation of their capacities. Present work is concerned with 
ise immunity in the case of optimal reception of a parameter 
rough such a channel, treated as a problem of optimal estimation 
a parameter in the sense of mathematical statistics. 


a the Ideal Reception of a Parameter Transmitted over a Channel 
smprising a Small Number of Paths—G. B. Linkovsky. 


a the Problem of Optimal Statistical Estimation of the Character- 
ics of a Multipath Communication Channel—B. 8. Fleishman, 
_B. Linkovsky, and Yu. B. Sindler. 


II. Theory of Information 


1 a Method of Linear Coding with Error Correction of Trans- 
itted Signals—R. R. Varshamov. 


a the Comparison of Uniform Codes for Binary Transmission— 
A. Varshaver. 


struction of an Optimal Code in the Sense of Shannon in the 
mplest Case of a Binary Noisy Channel—B. 8. Fleishman. 


In this work, an optimal code in the sense of Shannon is con- 
cucted for a binary symmetric channel. The construction is 
complished by random selection of M input sequence of length 
with subsequent elimination of some of them. Cases where such 
mination can be avoided are considered. Estimates of the prob- 
lity of obtaining an optimal code by a random selection as well 
the probability of correct decoding for finite n are obtained. 
is shown that in all cases, the former probability tends to unity 
th increase in m much faster than the latter probability. 


perimental Study of Statistical Characteristics of Patterns— 
N. Osher. 


III. Theory of Radio Relay Communication Laws. 


i the Influence of the Form of Correlation Function of Inhomo- 
aeous Turbulence in the Troposphere on Scatter Propagation 
UHF—A. V. Prosin. 


restigation of the Properties of Probability Distributions of 
ding and Disturbances in Radio Relay Communication Lines— 
. B. Sindler. 


Device for Measuring the Correlation Coefficient in Long Line 
mmunication at UHF—I. P. Levshin, and G. I. Slobodenuk. 


The following papers were published singly by the Professional 
up on Information Theory (I) and the Professional Group on 
tomata and Automatic Control (A) of the Institute of Electrical 


Abstracts 


513 


Communication Engineers of Japan, 2-8, Fujimicho, Chiyodaku, 
Tokyo, Japan. All are in Japanese, but English abstracts are gwen 
below when available. The affiliation of each author is given so that 
interested readers may contact the author directly for further information. 


Basic Theory for Pattern Recognition (A; December 10, 1959)— 
T. Iizima (The Electrotechnical Laboratory, 1, 2-chome, Nagata-cho, 
Chiyuda-ku, Tokyo.) 


Theory of Tape-Sorting (I; March 18, 1960)—T. Iizima (See above.) 


Theory of Sequential Machines (A; February 18, 1960)—S. Fujino 
(School of Science, Kyushu University, Hakozaki-machi, Fukuoka, 
Kyushu.) 


A Theory of Waveform Prediction (I; January 19, 1960)—T. Kasami 
(Engineering Department, Osaka University, 9-chome, Higashinoda- 
ku, Miyakojima-ku, Osaka.) 

The problem of selecting out the signal wave from a finite time 
observation is discussed. The prediction operator, which gives 
the predicted waveform based on the observed waveform, is assumed 
to be one of finite dimension which is generally realizable. The 
cost of observation and prediction is partly considered. A minimax 
solution is obtained for the case when the first and second statistics 
and a quadratic loss function are assumed. 


Information-Theoretical Analysis of Classification (I; January 19, 
1960)—Z. Kiyasu and S. Ikeno (Electrical Communication Lab., 
1551 Kichijoji, Musashino-shi, Tokyo.) 


Improvement Efficiency of an Error-Correction System in Short- 
Wave Circuits (1; February 26, 1960)—T. Kumagaya, K. Teramura, 
and H. Sakaguchi (Japanese Overseas Radio and Cable System, 
1-5, Ohte-machi, Chiyoda-ku, Tokyo.) 


Proof of Mathematical Theorems Using a Computing Machine 
(A; March 31, 1960)—N. Kuroda (Nagoya University, Chigu- 
sa-ku, Nagoya.) 


Basic Planning of a Vocal Typewriter (A; January 18, 1960)— 
K. Maeda, et al. (Kyoto University, Honcho Yoshida, Sakyo-ku, 
Kyoto.) 


Error Probability in FS Radio Teletype System (I; February 26, 
1960)—S. Miyake, K. Tadenuma, and T. Nakai (Japanese Over- 
seas Radio and Cable System, see above.) 


Sampling Errors in the Measurement of Autocorrelation (1; March 
18, 1960)—H. Miyakawa (Faculty of Engineering, University of 
Tokyo, Bunkyo-ku, Tokyo.) 

Not only the mean and variance, but also the statistical properties 
of yn as a function of n are calculated, where yn is an estimate of 
autocorrelation p(n). An estimate is derived from a single sample, 
of finite length, of the time series which is assumed to be stationary 
and Gaussian. It is found that the spectral decomposition theorem 
can be applied to the difference yn — p(n), though the difference 
is a nonstationary time series as a function of n. A simple relation 
between the spectral density of y, — p(n) and that of the original 
time series is also derived. 


A Psychological Study for the Improvement of Teletypewriters 
(I; December 19, 1959)—G. Ohwaki and K. Maruyama (Psycho- 
logical Laboratory, Tohoku University, Sakura-koji, Sendai.) 


Programming for Proof of Mathematical Theorems (A; March 
31, 1960)—G. Shimanouchi (Tokyo University of Education, 24 
Ohtsuka, Kubomachi, Bunkyo-ku, Tokyo.) 


Binary Representation of Vowels (I; January 19, 1960)—H. Suzuki 
and M. Oh-izumi (Electrical Communication Laboratory, Tohoku 
University, Sakura-koji, Sendai.) 


Programming for the M-1 Computer for the Proof of Mathematical 
Theorems (A; March 31, 1960)—T. Takasi (Electrical Communi- 
cation Laboratory, 1551 Kichijoji, Musashino-shi, Tokyo.) 


An Investigation of Radar Tracking Error (1; January 19, 1960)— 
K. Tanaka (Engineering Department, Kobe University, 1-chome, 
Mizukasa-cho, Nagata-ku, Kobe.) 


ENGINEERS ¢ SCIENTISTS 
TION THEORY 


i is 
elation Between Noisy Channe 


ORMA 
IRE TRANSACTIONS ON INF 


on Cross Corr 


it with impulse response " re 
we impulses, yat— 


A Theorem 


wi 
wees 


Wk, A) = P(ke” 


“ircu 
SRC integrating rey of ar- : 
mean frequencl that 00 
A 5 valn® TA calitude probe ph ypicaily we expect 
i eaqretor reach 


as a partic, 
ed, 80 nts, 


function 1 


ARL 


SYLVANIA’S 
Applied Research Laboratory 


Is engaged in diversified, active programs that 
afford broad individual participation 


The Applied Research Laboratory is directing its growing capa- 
bility toward theoretical and experimental investigations that will 
lead to major state-of-the-art advances in the field of military and 
commercial electronic systems. The opportunity for individual 
recognition in this challenging technological area is typified by 
the titles of the two recent technical papers, by ARL staff members, 
which are depicted here. 


If you possess superior qualifications (an advanced degree is desir- 
able) and would like to join this highly professional group, you are 
invited to inquire about career positions in these areas: 


@ INFORMATION & COMMUNICATION THEORY 

@ ELECTROMAGNETIC PROPAGATION 

@ HYPERSONIC GASDYNAMICS @ NEW TECHNIQUE INSTRUMENTATION 
@ MICROELECTRONICS 

@ MATHEMATICAL ANALYSIS & OPERATIONS RESEARCH 


For further information about research work in 
the above areas, and other technical publications 
by ARL engineers, you are invited to write to: 


Dr. L. S. Sheingold 
Director, Applied Research Laboratory 


Waltham Laboratories / SYLVANIA ELECTRONIC SYSTEMS 


Subsidiry of GENERAL TELEPHONE & ELECTRONICS Ss 


100 First Avenue—Room 9-E—Waltham 54, Massachusetts 


NOTICE 
10 
ADVERTISERS 


IRE ‘TRANSACTIONS ON 
INFORMATION THEORY 
will accept advertising, 
For full details contact 
E. K. Gannett, The Insti- 
tute of Radio Engineers, 
Inc., 1 East 79 Street, 
New York 21, N.Y. 


INFORMATION FOR AUTHORS 
ABA 


Authors are requested to submit editorial correspondence or technical manu- 
scripts to the Publications Chairman for possible publication in the PGIT Trans- 
ACTIONS. Papers submitted should include a statement as to whether the material 
has been copyrighted, previously published, or accepted for publication elsewhere. 


Papers should be written concisely, keeping to a minimum all introductory 
and historical material. It is seldom necessary to reproduce in their entirety previ- 
ously published derivations, where a statement of results, with adequate references, 
will suffice. 


To expedite reviewing procedures, it is requested that authors submit the 
original and two legible copies of all written and illustrative material. The manu- 
seript should be double-spaced, and the illustrations drawn in India ink on drawing 
paper or drafting cloth. Each paper should include a carefully written abstract of 
not more than 200 words. Upon acceptance, papers should be prepared for publica- 
tion in a manner similar to those intended for the PRoceEDINGS or THE IRE. 
Further instructions may be obtained from the Publications Chairman. Material 
not accepted for publication will be returned. 


IRE Transactions oN INFoRMATION THEoRY is published four times a year, 
in March, June, September, and December. A minimum of one month must be 
allowed for review and correction of all accepted manuscripts. In addition, a period 
of approximately two months is required for the mechanical phases of publication 
and printing. Therefore, all manuscripts must be subm&ted three months prior 
to the respective publication dates. 


All technical manuscripts and editorial correspondence should be addressed to 
Arthur Kohlenberg, Melpar, Inc., 11 Galen Street, Watertown 72, Mass. Local 
Chapter activities and announcements, as well as other nontechnical news items, 
should be addressed to David Van Meter, Litton Industries, Inc., Waltham, Mass. 


DIN STACKS 


INSTITUTIONAL LISTINGS 


The IRE Professional Group on Information Theory is grateful for the assistance 
given by the firms listed below and invites application for Institutional Listing 


from other firms interested in the field of Information Theory. 


IBM RESEARCH, INTERNATIONAL BUSINESS MACHINES CORP., Yorktown Heights, N. Y. 


Error Correcting & Detecting Codes, Theory of Assemblies & Automata, Information Networks, Reliability 


REPUBLIC AVIATION CORP., Farmingdale, N. Y. 


Aircraft, Missiles, Drones, Electronic Analyzers; U. S. Distr. of Alouette Turbine-Powered Helicopter 


NOTICE TO ADVERTISERS 
The IRE TRANSACTIONS ON INFORMATION THEORY willl accept both 


display advertising and Institutional Listings. For full details, contact 


E. K. Gannett, The Institute of Radio Engineers, Inc., 1 East 79 Street, 
New York 21, N. Y. 


